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NOTATION 


The  followinq  notations  are  used  throuqhout  this  thesis. 
ft         The  meet  operation  in  a  lattice  space. 

V  Set  union. 

I         Set  intersection . 

C         Subset. 

I  Pit  (or  bitwise)  OR. 

5  Bit  (or  bitwise)  AND. 

T         Loqical  op  (of  assertions). 

6  Locical  AND  (of  assertions) . 

V  For  all  members  of  a  set. 
6         Member  of  a  set. 

0         Th»  empty  set. 

r*i        Denotes  a  subscript. 

{•#•#•>    Denotes  a  binary  relation. 

T<Enqlish  statement>l 

Denotes  that  the  value  associated  with  the  Enqlish 
statement  is  evaluated  in  an  unspecified  way. 

|«|        Denotes  the  cardinality  of  the  set  or  qraph. 
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1     INTRODUCTION 


This    thesis    develops    iterative    global    flow    techniques    for    the 
purpose    of    ietectinq    anomalies    in    source    proqrams.      Currently 
existing    iteritive    qlobal    flow   techniques    are    presented    as 
background;    then    improvements   are    presented,    which    modify   the 
structure   of    Kara    S    Ullman's    alqorithm    (fKAH76  1)    in    order    to    enhance 
its    spped    of    converqence.      The    improvements   do    not   chanqe   the    basic 
propt-rtips   of    the    iterative    qlobal    flow   technique.      The    improved 
technique    is    applicable    to   a    wide    ranqe  of   qlobal    flow    problems, 
includinq    those    found    in    compiler    optimization. 

The    anomalies    to    be   detected    are    associated    with    proqram 
constructs    that    are    leqal    (compile    time   constructs)     within    the 
source    lanquaqe,    but    violate    basic    proqramminq   concepts.      These 
anomalies    do    not    necessarily   constitute  errors,    but    are    often 
associated    with    loqic    or   execution   errors    of    various    forms.       A    set 
of    anomalies,    which   are    lanquaqe    independent    and    representative    of 
th»    t  yDe    of    anomaly    commonly    found    in    students'    proaratns,    are 
ident if ied. 


Three  specific  alobal  flow  frameworks,  which  are  based  on  the 
improved  iterative  qlobal  flow  technique,  are  developed  and  used  as 
tools  ir.  th<=»  detection  of  the  anomalies.  Detection  schemes  are 
dev^lopM  that  are  able  to  find  specific  instances  of  the  anomalies 
without  any  knowledge  of  the  alqorithm  the  student  is  attemptinq  to 
implement.  These  specific  frameworks  are  also  applicable  to  a  wide 
range    of    qlobal    flow    problems. 


1.1  Thesis  Overview 


QSBflll  Global  Flow  Techniques 

Many  techniques  have  been  developed  in  order  to  solve  the 
problems  inherent  in  compiler  optimization.   Some  optimization  car 
be  performed  within  the  confines  of  a  sinq]e  statement  (fB(JS69], 
rpn^701);  other  optimizations  deal  with  the  qlobal  properties  of 
data  flow  thnuqhout  the  proqram,  commenly  known  as  global 
optij.lz,at ion.   Two  basic  underlyinq  techniques,  interval  analysis 
(rAHO701,  fAH076"|,  fALL70  1,  rALL761,  r  COC70  1,  [  EA872  1,  TEAR7U1, 
rHEC721,  rH^C73  1,  r  HEC7U  1,  TKEN71],  TULL72])  and  iterative  analysj 
(TFON75"!,  r^RA751,  rKAM761#  \KIL13})0     dominate  the  field  of  qloba] 
optimization.   Both  use  the  concept  of  a  flew  graph  of  the  source 
proaram  a;--  a  basic  +-ool  for  p<=»rf  orminq  their  analysis. 

Interval  analysis  decomposes  the  flow  qraph  into  disjoint 
single  entry  subgraphs,  which  are  more  easily  analyzed  than  the 
oriainal  flow  qraph.   Tn  essence,  each  sinqle  entry  subgraph  is 
analyzed  separately  and  information  about  the  entire  proqram  is 
letermined  by  "splicinq"  these  separate  analyses  toqether  by  use  <£ 
a  sequence  of  ieriyed  graphs.   Interval  analysis  usually  requires 
that  the  underlyinq  flow  qraph  be  reducible  ([  HEC72  1,  ("HEC7M), 
although  some  n^wly  developed  techniques  have  shown  that,  this  is 
not  always  required  (T  ALT.76 1)  .   Interval  analysis  techniques  have 
be^n  successfully  used  in  solvinq  many  compiler  optimization 
problems,  and  h^vt'  h<->en  the  predominant  approach  in  the  past. 


Recently,  iterative  qlobal  flow  techniqups  have  been  developed 
([KK*1(:>'\,    fKIL711)  that  seem  to  be  applicable  to  a  wi<ier  ranqe  of 
problems  and  are  comparable  (in  both  time  and  space)  to  the 
interval  analysis  techniques.   In  these  techniques,  information 
associated  with  each  node  is  iteratively  propaqated  throuqh  the 
flow  qraph,  and  the  analysis  converges  when  the  information  at  each 
nole  stabilizes;  i.  e.,  the  information  associated  with  each  node 
do<*s  not  change  as  a  subsequent  iteration  is  performed. 

This  thesis  dev^ioDS  iterative  qlobal  flow  techniques  as  a 
tool  for  ietectinq  anomalies  in  students'  proqrams.   Kam  6  Ullman 
(T  KAA76  1)  hav3  developed  a  specific  form  of  Kildall's  alqorithm 
{rKTL71T),  which  processes  the  nodes  of  the  flow  qraph  in  a 
preieter mined  ord^r  to  insure  rapid  converqence  under  certain 
conditions.   This  thesis  presents  an  improved  form  of  Kam  C 
oilman's  alqorithm,  which  further  enhances  the  speed  of 
converqence.   The  actual  improvement  for  a  qiven  flow  qraph  depends 
on  th<^  structure  of  the  flow  qraph  and  the  information  beinq 
propaqated  throuqh  the  flow  qraph,  but  in  many  cases  th»  improved 
alqorithm  converqes  twice  as  fast  as  that  of  Kam  &  Ullman. 

5E32ili£  Framework  s 

Th°  concept  of  iterative  qlobal  flow  analysis  discussed  above 
ieils  with  the  mechanism  us^d  to  propaqate  information  throuqh  thp 
flow  oraph.   ^ho  ♦ype  of  the  information  to  be  propaqated  and  the 
w^y  in  which  it  is  propaqated  throuqh  the  flow  qraph  are  restricted 
by  the  method,  but  many  different  realizations  fall  within  these 
restrictions.   Three  specific  frameworks,  P.ro2erty_  P  propagation. 


call  jjrajah  analysis,  and  invariant  assertion  analysis,  which  use 
the  qeneral  concept  of  iterative  qlobal  flow,  are  developed  in  this 
thesis.   They  are  sufficient  to  detect  the  set  of  anomalies  that 
are  presented  in  this  thesis  and  are  applicable  to  a  wide  ranqe  of 
qlobal  flow  and  compiler  optimization  problems. 


Property  P  Propaqation 

Property  p  propaqation  addresses  qlobal  flow  problems  in  which 
th-^  information  to  h«=  propaqated  throuqh  the  flow  qraph  can  be 
encoded  within  a  sinqle  bit  of  information.   This  application  can 
he  efficiently  implemented  with  bit  vectors  (i.  e.,  encodinq  many 
pieces  of  information  within  a  sinqle  word  of  memory)  and  the 
bitwise  Boolean  operations  available  on  most  computers.   The  two 
specific  realizations  of  property  p  propaqation  that  are  presented 
provide  a  uniform  framework  into  which  many  of  the  classical  qloba] 
flow  problems  fall  (eq.,  live  variable  analysis,  common 
subexpression  detection,  and  dominance  and  ancestral 
relationships) .   The  way  in  which  bit  information  is  propaqated 
throuqh  the  flow  qraph  is  inherently  specified  within  the  framewor! 
of  the  analysis  and  helps  to  identify  the  data  dependency  that  ■■••he 
specific  analysis  is  designed  to  uncover. 

Because  of  the  efficiency  with  which  this  type  of  analysis  ca 
be  executed,  it  is  of  +  en  advantaqeous  to  decompose  a  complex  qloba 
flow  problem  into  a  set  of  simpler  flow  problems  to  which  property 
?  propanation  can  be  applied  instead  cf  solvinq  the  oriqinal 
proMr-m  with  a  mor^  complex  qlobal  flow  technique.   This  approach 


is  applied  in  spveral  of  the  detection  schemes  that  are  developed 
in  Chapter  l« 

Call  Graph  Analysis 

Whon  subroutine  calls  (with  argument  lists)  are  part  of  the 
source  lanquaqe  (and,  thus,  are  present  in  the  flow  qraph) ,  some 
mechanism  for  analvzinq  the  separate  subroutines  and  transferring 
information  between  them  must  be  developed.   The  concept  of  a  call. 
3I12t#  which  encodes  the  dependency  between  subroutines,  is 
developed  and  iterative  qlobal  flow  techniques  are  applied  to  it. 
A  framework  is  iev^loped  that  performs  the  elementary  qlobal  flow 
analysis  on  th?  individual  flow  qraphs  of  the  subroutines  in  a 
natural  order  and  transmits  the  required  flow  information  to  the 
called  or  callinq  subroutine  (for  use  in  its  analysis).   This 
framework  effectively  handles  recursive  subroutine  calls  and 
built-in  functions  for  which  a  flow  qraph  is  not  available. 

Invariant  Assertion  Analysis 

Invariant  assertion  analysis  deals  with  the  automatic 
qen^ration  of  assertions  about  the  statements  and/or  variables  of 
the  source  program  (rELS72"|,  ("HOA69],  [KAT761,  [HEG7U],  fWEG751). 
'''he  particular  realization  that  is  presented  in  this  thesis  encodes 
information  about  the  relationship  between  variables  and  constants. 

9.   fir"irp  in  ilysis  is  performed  without  any  manual  insertion  of 
assertions.   This  provides  a  basic  framework  in  which  anomalies 
involvir.q  the  values  of  variables  (eq.,  division  by  zero,  array 
subscript  out  of  bounds,  takinq  the  square  root  of  a  neqative 


number,  ?tc.)    can  be  detected.   This  specific  framework  has 
applications  in  compiler  optimization  and  proqram  proving. 

Anomalies 

Thp    production   of    skillful    programmers   is   a   desirable 
side-ef f ect   of    any    computer    science   course    that   employs    prograramin< 
problems    as   a    basic    teaching    tool.      Because    of    limited    teaching 
resources    (eg.,    teaching    assistants,    graders,    and    consultants),    thi 
use    of    an    intelligent    compiler  capable   of   advising    the    student 
about    undesirable    properties    of   his   programming  would    provide    a 
powerful    teaching   tool    (FNIE7U1).      Several   existing    compiler 
systems    do   present    good    syntax    error    diagnostics,    perform   some    for 
of   syntax   correction   and    give    understandable    error   diagnostics 
(rcnN73  1,    rCRR70"|).       Systems    have    been    developed    to    help    guide   the 
student    through   the   resolution   of    a   syntax  or   execution   error 
(rTTM75  1,    rDAV75"l)    and    to    guide   him    through    the   structured    solutio 
of    a    given    problem     0"  DAN75  1,  \  MAT76  1,T  HYD75  1)  ,    but    they   do   not 
address    anomalies    present    in    programs;    i.    e. ,    combinations   of 
constructs    that    are    legal    ir    the    source   language    but    are 
inefficient,    ineffective,    or    considered    poor    programming    practice. 

By    detecting    such    anomalies,    which    do   not    necessarily 
constitute    errors,    the    student's    attention    can    be   directed    to 
regions    of    his    program    where    logical   errors    may   be    present.       It. 
t S in    becomes    the    student's    responsibility    to   analyze    his    program 
1et.»rmine    the    error,    if    there    exists    one,    that    induced    the   anomaly 


This  thpsis  presents  the  elements  for  a  pedaqoqical  component 
of  a  compiler  system  (r^TL76  1)  capable  of  drtectinq  such  proqram 
amraali^s  without  any  knowlpdqe  of  the  alqorithm  beinq  implemented. 
Th^  concepts  presented,  while  applicable  to  a  batch  compiler 
systom,  are  intended  for  use  within  an  interactive  compiler  system 
(such  as  that  described  in  f WTL76  1;  see  Sections  1.3  and  S. 1 
below )  . 


1.2  Backqrounl  On  Anomalies 


Students  learninq  their  first  proqramminq  lanquaqe  normally 
have  a  very  "narrow"  view  of  the  problem  solvinq  process.   They 
learn  thQ  qeneral  function  of  individual  statements  in  the 
particular  lanquaqe,  but  th^y  are  not  familiar  with  all  the 
lanquaqe  features  and  lack  the  insiqht  to  select  the  most 
appropriate  lanquaqe  constructs  for  the  particular  problem  they 
»'1S*  solv«. 


"Poorly  structured"  programs  are  often  produced  because 
stulents: 

•  start  codinq  before  they  understand  all  aspects  of  thp 
problem , 

•  proqram  piecemeal  and  add  "fixes"  to  patch  up  incomplete 
alqorithms  instead  of  restructurinq  or  chanqinq  the  basic 
alqorithm,  and 


•  view  their  program  as  a  series  of  essentially  unconnected 
statements  without  reflecting  on  the  more  global  aspects  of( 
their  program. 

Hera,  "poorly  structured"  refers  not  only  to  control  flow  but  also 
to  inefficient,  ineffective,  or  erroneous  data  flow.   Some  of  the 
reasons  why  this  occurs  are: 

•  lack  of  experience, 

•  material  presented  to  the  student  (If  the  student  is 
presented  with  erroneous  material,  he  will  produce 
erroneous  programs.), 

•  lack  of  desire  to  expand  their  own  programming  ability 
(usually  caused  by  lack  of  interest),  and 

•  misunderstandings  or  misconceptions. 

An  autom^t^d  system  capable  of  performing  global  flow  analysi 
(that  the  student  fails  to  perform)  is  clearly  appropriate.   A 
system  capable  of: 

•  detecting  program  anomalies, 

•  giving  detailed  information  about  the  anomalies  (i.  e., 
helping  the  student  understand  what  is  wrong),  and 

•  helping  direct  the  student  in  correcting  the  anomalies 
would  b«=  a  valuable  pedagogical  tool. 

Da^a  Col lect  ion 


Approximately    2S0    programs    (solving    U    different    problems)     ha 
been    c>llecte1    from    a    basic    computer    science    course    that    used 
por^pam    as    its    major    programming    language.      These    programs    were 
analyzed    for   defects    involving: 


•  inefficiency, 

•  ineffective  constructs, 

•  style, 

•  lanquaqe  misconceptions,  and 

•  alqorithm  lefects. 

?.v°n    thouqh  these  proqrams  w^re  final  copies  turned  in  for  qradinq 
(so  thit  many  of  th°  anomalies  present  in  earlier  versions  were  not 
presjnf)  ,  a  sumrisinqly  larqe  number  of  defects  were  found.   Final 
stif  i  ^* ica  1  analysis  of  the  data  has  not.  yet  been  completed,  but 
preliminary  analysis  has  revealed  a  number  of  interestinq 
anomalips,  which  have  been  included  in  this  thesis. 


1.3  Interactive  compiler  And  Pedaqoqical  Framework 

»n  interactive  compiler  system  provides  a  powerful  tool  for 
both  thr-  production  proqrammer  and  the  student  learninq  a  new 
lanquai<~-.   A  production  proqrammer  can  quickly  qenerate  modules 
(subroutines)  of  the  system  he  is  developinq  and  dynamically  supply 
inpu*  lata  on-line.   Pecause  he  can  interact  with  his  program,  he 
can  supply  representative  data  to  his  module  and,  on  the  basis  of 
the  results  obtained,  can  quickly  debuq  and  modify  the  function 
beinq  imDlemf n t°d . 

By  usinq  an  interactive  compiler,  a  student  learninq  a  new 
lanquaqe  can  qe*  immediate  feedback  about  syntax  errors.   In  this 
way  h^  can  quickly  eliminate  the  use  cf  constructs  that  seem 
loqical  to  him  but  are  not  allowed  in  the  lanquaqe.   If  the  student 
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is  unsure  of  the  semantics  of  a  given  lanquage  construct  (or 
sequence  of  constructs),  he  can  quickly  produce  a  test  proqram  to 
answer  the  question  for  special  cases  that  were  not  clear.   As  with 
*■ ha    production  proqrammer,  the  student  can  implement  and  test  small 
seqments  of  his  proqram  in  the  process  of  developinq  his  final 
proqram. 

It  is  within  this  framework  that  the  analysis-detection 
schemes  Dresentpd  in  this  thesis  are  intended  to  be  implemented, 
on  the  basis  of  tha  results  of  a  qiven  type  of  analysis,  the 
student  mav  modify  his  proqram  hopinq  to  resolve  an  anomaly.   He 
can  then  request  that  the  same  analysis  (or  a  different  analysis) 
be  performed  to  verify  that  the  problem  has  been  resolved.   He  may 
find  that  he  has  created  other  anomalies  and  it  may  take  him 
several  iterations  to  produce  a  proqram  with  which  he  is  satisfied. 

The  detection  techniques  presented  in  this  thesis  are  equally 
applicable  within  a  batch  compiler  system,  but  the  followinq 
aspects  of  a  batch  system  detract  from  the  usefulness  of  the 
analysis-detection  system. 

1)  The  time  laq  (which  has  many  components)  between  job 
submittals  may  break  the  student's  line  of  thouqht  and 
cause  him  to  forqet  why  he  made  a  particular  modification 
to  his  program.   He  may,  thus,  reinstate  a  previous  versio 
of  his  proqram  forqettinq  that  this  will  introduce  a 
previously  resolved  anomaly. 

2)  The  student  must  decide  at  the  time  of  submittal  all 
analyses  he  wishes  performed  on  his  proqram.   On  the  basis 
of  tha  results  of  one  of  these  analyses,  the  student  may 
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decide  he  wants  another  type  of  analysis  performed.   In 
ord^r  to  qet  the  second  analysis,  he  nust  submit  a 
co»Dl?tPly  n-^w  -job  to  the  system. 
"M  In  most  batch  systems,  each  time  a  batch  -)ob  is  submitted 
all  analyses,  including  syntax  checkinq, 
compiler-translation  and  qraph  generation,  must  be 
repeated.   This  represents  a  siqnificant  overhead  when 
compared  to  an  interactive  system,  which  can  usp  knowledge 
of  the  previous  proqram  structure  to  determine  the  current 
proqram  structure. 

Th^  philosophy  behind  the  interaction  with  the  student  is  an 
idDortant  consideration  and  should  be  developed  usinq  the  followinq 
guidelines.   The  student  should  never  be  told  that  he  should  or 
Ii5t  apply  a  qiven  transformation.   Instead,  he  should  be  informed 
that  the  qiv°n  transformation  will  enhance  the  structure  or 
execution  of  his  proqram  in  a  specific  way. 

For  instance,  qiven  the  code  sequence 
I  =  1 

Dn  10  I  =  1,5 
the  detection  schemes  can  determine  that  the  value  1  assiqned  to  I 
in  th*»  statem^n*  HI  =  1"  is  unreferenced  because  the  variable  I  is 
assiqned  aqain  in  the  next  statement.   Thus,  the  assiqnment 
Itateaent  can  be  deleted  from  the  proqram  producinq  equivalent,  but 
aore  efficient,  execution.   The  statement 

■I  =  1  should  be  deleted" 
3hoild  not  be  presented  to  the  student.   Instead,  a  statement 
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indicating  why  the  assignment  statement  is  ineffective  and  the 
effect  cf  deleting  the  statement  should  be  presented. 

Since  the  analysis-detection  system  has  no  knowledge  of  the 
algorithm  being  implemented,  it  cannot  really  determine  what 
transformations  should  be  applied.  For  instance,  the  program  bein 
analyzed  may  be  only  a  partial  solution  to  the  problem  the  student 
is  solving.  In  the  example  above  the  student  may  intend  to 
subsequently  insert  code,  which  references  (and  possibly  modifi°s) 
I,  between  the  assignment  statement  and  the  DO  statement.  Thus, 
tellinq  the  student  tha*  he  should  delete  the  assiqnment  statement 
nay  be  misleading. 

rt  should  be  realized  that  any  suqqestion  qiven  by  the  system 
will  greatly  iaflu3nce  the  action  taken  by  the  student. 
Suggestions  should  indicate  why  a  given  transformation  might  be 
applied  and  the  effect  of  applying  the  transformation. 


The  decision  to  apply  a  given  transformation  should  always  b< 
l3ft  to  the  student.   in  this  respect  the  philosophy  behind 
compiler  optimization  and  the  analysis-detection  system  differ 
greatly.   in  compiler  optimization  the  purpose  is  to  make  the  usei 
proarara  execute  more  efficiently,  and  transformations  can  be 
applied  without  informing  the  usar.   Since  the  application  of 
transformations  is  essentially  invisible  to  the  user,  several 
different  transformations  can  be  simultaneously  applied  within  th< 
s^m^  region  of  the  program  producing  vastly  different  (but 
sunpos^rlly  ^gaivalent.)  code,  but,  in  the  analysis-detection  syste 
h°ing  discussed  h<^re,  the  purpose  is  to  inform  the  student  about 
Lous  properties  of  his  program  and  to  indicate  ways  that  he  ca 
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improve  his  praqraaminq.   Sir.ce  the  purpose  is  to  make  program 
transformations  very  visihle  to  the  student,  th^y  should  be 
presorted  to  the  student  one  at  a  time  so  that  he  can  understand 
each  stpp  cf  the  simplification  process.   At  pach  step,  the  student 
(anl  not  the  system)  should  b*3  the  one  to  decide  whether  the 
*  rar.sf  orma  tion  suggested  is  applicable  to  the  specific  situation. 


1.4  Thesis  Statement 


Many  program  anomalies  associated  with: 

•  programming  stylp, 

•  efficiency,  and 

•  algorithm    or    lanouaoe    misunderstanding   or    misconception 
can    he   detected    by    an    automated   system    having    no    knowledge    of    the 
user    algorithm   being   impl^mpnted.       The   three    basic    iterative    global 
flow    frameworks    to    be    developed    in   this   thesis   are    sufficient    to 
detrc*    a    largi    portion    of    these    anomalies. 

An    intelligent    compil?r   system,    which    incorporates    the 
technigues    developed    in    this    thesis,    can    be    designed    and 
iaol empn ted.       Such    a    compiler    system    would   be   a    useful    tool    in 
teaching    the    general    technigues    of   computer    programming    and    the    use 
of   specific   constructs    within    the    language. 
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Chapter  1  has  presented  some  backqround  about,  why  students 
produce  thp  anomalies  that  they  do  and  has  discussed  a  basic  desiqr 
philosophy  by  which  detection  schemes  miqht  be  incorporated. 
Chaotcr  2  defines  and  presents  specific  examples  of  anomalies 
produced  by  students  that  can  be  detected  by  th<=  use  of  iterative 
qlohal  flow  analysis.   chapter  3  presents  the  basic  underlyinq 
techniques  that  are  used  in  detectinq  the  anomalies  defined  in 
Chapf^r  2.   Chapter  4  develops  specific  detection  outlines  for  eac 
of  the  anomalies  defined  in  Chapter  2  usinq  the  tools  presented  in 
Chapter  3.   Chapter  5  d«als  with  implementation  asppcts  of  the 
techniques  developed  in  Chapters  3  and  4  and  presents  a  system 
desiqn  in+o  which  the  detection  schemes  can  be  incorporated. 
Chapter  6  discusses  conclusions,  other  applications  of  the 
techniques  lev=»lopf  cl,  and  further  work  to  be  done.   Appendix  I 
formally  defines  th^  concept  of  a  flow  qraph  and  its  associated 
nomenclature.   Appendix  II  contains  a  qlossary  of  terms  used 
throuqhout  the  thesis. 
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2  CONSTRUCTS  COI1NONLY  ASSOCIATFD  WITH  ANOMALIFS 


This  chapter  identifies  and  describes  specific  anomalies 
representative  of  those  commonly  found  in  students'  proqrams.   Each 
is  lanquaqe  independent  and  can  be  detected  without  any  knowledqe 
of  the  alqorithm  the  student  is  attemptinq  to  implement.   These 
anomalies  have  been  identified  by  analyzinq  the  set  of  proqrams 
■entioned  in  Chapter  1  and  by  applyinq  the  experience  obtained  by 
workinq  in  a  Droqram  consultinq  office  over  a  period  of  three 
ye  irs. 


These  anomalies  do  not  necessarily  constitute  errors,  but  are 
often  associated  with  errors.   If  an  error  is  present,  it  cannot  be 
exDlicitly  identified  since  such  an  identification  may  require  some 
knowledqe  o*  the  alqorithm  beinq  implemented.   The  purpose  of 
detectinq  the  anomalies  is  to  direct  the  student's  attention  to 
reqions  of  his  proqram  wh«=>re  errors  may  be  present.   It  then 
becomes  the  student's  responsibility  to  analyze  his  proqram  to 
letermine  why  the  anomaly  is  present,  and,  if  applicable,  to 
resolve  the  °rror. 

Tn  order  to  emphasize  the  reason  for  detectinq  these  anomalies 
anl  hrir.qinq  them  to  the  student's  attention,  loqical  errors 
commonly  associated  with  each  anomaly  are  presented  alonq  with 
specific  examDles.   Fiqure  2.1  contains  a  subroutine  implementinq 
the  binary  chop  method  of  roo*  findinq  and  will  be  used  to  present 
sot)->  specific  examples  of  anomalies  to  be  detected.   This  is  the 
type  of  code  many  beqinninq  FORTRAN  proqrammers  produce  as  a  final 
pr->iuct  (i.  e.,  turn  in  to  bf  qraded)  .   It  should  be  realized  that 
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L1 

SCJBROUTINE  BTNCHP  (X  L,X  R  ,  EPS,  DELTA,  FOOT) 

L2 

YL  =  F(XL) 

L3 

YR  =  F(XR) 

LU 

IF (YL*YR. GT. 0)  GOTO  10 

LS 

20 

ITER  =  0 

L6 

IF(ABS(XR-XL) .LE.EPS)  GOTO  30 

L7 

XM  =  (XL+XR) /2. 

L3 

YM  =  F(XM) 

L9 

ITER  =  ITER  ♦  1 

L10 

DELTA  =  ABS(XR-XL)/2. 

L11 

PRINT, ITER, XM, DELTA 

L12 

IF (YL*YM. LT.O.)  GOTO  UO 

L13 

XL  =  XM 

L1U 

YL  =  YM 

115 

GOTO  20 

L16 

UO 

XR  =  XM 

L17 

YR  =  YM 

L1B 

GOTO  20 

L19 

30 

ROOT  =  XM 

L20 

10 

RETURN 

L21 

END 

Fiqur e    2. 1 
SAMPLE    PROGRAM 


the   qlobal    flow   techniques    to    be    developed    can   detect,    the   anomalie 
to    b<=   nresent^rt    in    this   thesis   within    arbitrarily  complex  code 
sequences.      Simple    code    spquences    are    used   as    examples   in   this 
chapter    because   they   more-   clearly    display    the    essence   of    the 
anomaly    to    be    detected. 


2.1    Unref erencei    Data 


Hnr^f erencrd    3ata    occurs    when: 

•   at-  a  specific  statement,  S,  a  value,  D,  is  assiqned  to  a 
variable,  V,  and 
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•  that  value,  D,  is  not  referenced  by  any  statement  of  the 
proqrira  subsequent  to  S.   This  can  happen  in  a  combination 
of  two  ways: 

1)  variable  V  is  reassiqned  a  value  prior  to  a  referencp, 
or 

2)  the  variable  V  is  never  referenced;  i.  e.,  an  "exit" 
is  encountered  prior  to  a  reference. 

This  anomaly  is  commonly  associated  with  the  following  loqic 
errors: 

•  a  misconception  about  the  use  of  lanquaqp  constructs, 

•  a  misconception  about  the  alqorithm  beinq  implemented, 

•  the  us<=>  of  one  actual  variable  for  two  loqically  different 

variables, 

•  the  use  of  two  actual  variables  for  one  loqical  variable, 

and 

•  extraneous  variables  left  over  as  a  result  of  previous 

alqorithm  approaches. 


Language  nisconcept  ion 


Consider  a  code  sequence  of  the  form 
I  =  1 

DO  1  0  I  =  1 , 2  0 
Students  qeneratinq  this  type  of  code  typically  use  the  followinq 
Loqic.   The  student  knows  that  a  loop  is  beinq  constructed  and  that 
T  is  t hp  induction  variable.   The  induction  variable  must  be 
initialized  outside  th^  loop  and  not  realizinq  that  the  DO  loop 
Df-rfoms  this  action,  the  student  supplies  the  initialization. 
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As  a  second  example  consider  the  code  sequence 

I P  (  \ .  EQ  .  3 )  K  =  5 

K  =  6 
The  student  is  probably  tryinq  to  implement  an  IF-THEN-ELSE.   The 
student's  loqic  miqht  run  as  follows.   If  K  is  assiqned  in  the  IF 
statement,  the  compiler  should  realize  that  the  assiqnment  to  K  in 
the  npxt  statement  should  be  skipped. 


£i:2.2EilhJ5  Misconception 

An  example  of  this  occurs  in  Fiqure  2.1.  The  value*  of  YR 
assiqned  at  L17  is  not  referenced  within  any  descendant  of  L17. 
Th^  student  probably  included  this  assiqnment  statement  because  it 
makes  the  handlinq  of  the  movement  of  th°  riqht  end  point  symmetri 
with  that  of  the  left  end  point. 

1H±   Variably  ?or  Two  Logical  Variables 


Consider  the  code  sequence 

DO  10  I  =  1,10 

SUM  =  0 

DO  20  I  =  1,5 
S-ich  a  sequence  renresents  +he  beqinninq  of  two  nested  loops. 
Clearly,  two  separate  variables  should  be  used  as  induction 
vari^bl<=s. 
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!*£>  Variables  For  On?  logical  Variable 

Consider  the  code  sequence 
SUM  =  0 
DO  10  I  =  1 r  5 

io  suns  »  son  ♦  Mi) 

Th^    initialization    of    SIMM    to   zero  outside   the    loop    will    be 
ipfct^d    as   unreferenced    data. 

Extraneous  Variable 

Consider  the  code  sequence 
SWITCH  =  0 


in  which  no  farther  reference  to  SWITCH  is  made.   Such  a  variable 
■av  be  left  over  from  a  previous  approach  at  solvinq  the  problem. 


2.2    Unir itialized  Variable 


A  variable,  V,  referenced  at  a  specific  statement,  S,  may  be: 

•  totally  uninitialized;  i.  e. ,  no  execution  path  from  the 
beqinninq  of  th»  proqram  to  S  assiqns  a  value  to  V,  or 

•  partially  uninitialized;  i.  e.,  there  is  at  least  one 
execution  path  from  the  beqinninq  of  th^  proqram  to  S  which 
do«=s  not  assiqn  a  value  to  V. 
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This  anomaly  is  often  associated  with  the  following  logic 
errors: 

•  a  misconception  about  the  algorithm, 

•  a  misconception  about  the  use  of  a  language  construct,  and 

•  thf>  us3  of  two  separate  variables  for  one  logical  variable. 

Alsoiithm  Misconception 

Consider  XM  referenced  at  L19  of  Figure  2.1.   Assuming  XL  and 
XR  are  sufficiently  cIosp  upon  entry  to  the  subroutine,  i.  e., 
|XR-XL|  <  EPS,  then  the  flow  of  control  might  be  (L1 ,  L2  ,  L3,  L4J 
L5,  L6,  L19,  L20) .   This  execution  path  leaves  XM  uninitialized 
when  referenced  at  L19  and,  thus,  an  erroneous  root  is  returned. 

Language  Misconception 

Consider  the  code  seguence 
DO  10  I  =  1,  N 
10  SUM  =  SUM  ♦  A (I) 
PRINT, I, SUM 

Upon  exit  fro*  th*=>  DO  loop  (in  FORTRAN),  the  index  variable,  I,  i? 
und3fin?*d  and  should  not  be  referenced  in  the  PRINT  statement. 


1*9  Variably  for  One  logical  Variable 

Corsider  the  code  seguence 
THETA  =  ATAN(A/R) 
PRINT, THEDA 
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THEDA  referenced  in  the  PPINT  statement  is  uninitialized.   The 
anomaly  occurs  because  THFTA  and  THEDA  represent  the  same  loqical 
variable. 


2.1  Program  Structure 


Students  will  often  fail  to  use  languaqe  constructs  providinq 
specialized  control  structure,  such  as  loops  and  IP-THEN-ELSEs. 
Instead,  th^y  use  low  level  constructs,  such  as  GOTOs  and  LABELS  to 
implement  the  correspondinq  hiqh  level  construct.   This  may  occur 
Kcaase  the  student  is  unaware  of,  or  unfamiliar  with,  the  qiven 
hijh  level  construct,  or  the  student  may  be  unaware  that  he  is 
simulatinq  a  qiven  hiqh  lev^l  construct.   In  any  event  the  student 
should  be  mad-*  aware  that  a  hiqh  level  construct  is  available  to 
him  for  implementinq  the  desired  flow  cf  control. 

Each  separate  lanquaqe  has  its  own  specialized  control 
constructs,  but  loops  and  IF-THFN-ELSE  constructs  are  almost 
universally  present  in  hiqh  level  lanquaqes.   Detection  schemes  for 
these  two  constructs  will  be  developed  ([BAK76]). 


2.4  Common  Expression  Detection 

Students    often   calculate    expressions    with   exactly    the    same 
value    spv^ral    places    in    their    proqram.       Such    duplications   can    be 
i'l^om  at  ically    detected.      The    purpose    of   brinqinq    this    to    the 
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stuient's    attention    is   not    to    produce    more   efficient   code    (since   an 
optimizinq   compiler    will   eliminate   such   redundant    computations)    but 
to   help    the    student    better    understand    how    information    flows    througt 
his    proqram. 
Example: 

The    value    of    ABS(XR-XL)     computed   at    L6    is   exactly   the 
same   as    that   computed    at    L10.      A    temporary    variable    can 
be    used    to   transfer    this  value    to   the   two    places   it    is 
use  I . 

This    anomaly   usually    occurs    because    the    student    approaches   th< 
solutior    to   the    problem   piecemeal.      As    he   adds    new   code    sequences 
to   his    proqram,    he    calculates    values    without    realizinq    that    they 
may   be   available    from    other   reqions   of    his    proqram. 


2.5   Local    Variable    In    Parameter   List 


Students    will    often    place   a    local   variable    of    the    subroutine 
in   the    parametPr    list.      This   can    often    be    automatically    detected 
pv<*n    if    the   corr*=>spondinq    arqument   is    actually    manipulated    in    the 
c^llinj    routine    (althouqh    computations    involvinq    thp    arqument    are 
normally    completely    absent). 
Fxample: 

Th^  variable  DELTA  in  the  parameter  list  at  L1  is 
probably  a  local  variable.  Since  DELTA  is  assiqned  pri< 
*-o  my  rofrrcnco,  it  cannot  be  an  input  variable.  If  tl 
v^lie    of    DELTA    returned    to   the    callinq    routine    is   never 
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rpfarenced  (see  section  2.1),  it  cannot  be  an  output 
variable,  and  it  can  be  concluded  that  DELTA  is  a  local 
variable. 

Local  variables  in  a  parameter  list  normally  occur  for  one  of 
-WD  reasons.   either  the  student  mistakenly  believes  that  all 
variables  usel  in  the  subroutine  must  appear  in  the  parameter  list 
or  thr  variable  is  a  remnant  of  a  previous  approach  to  solvinq  the 
problem. 


2.6  Modification  Of  Input  Parameter 

Tt  is  qen3rally  considered  poor  proqramminq  practice  to  modify 
an  input  parameter  of  a  subroutine  in  a  lanquaqe  that  uses  call  by 
r^f^r°nc<=  to  implement  parameter  passinq  (of  course,  a  parameter 
■ay  be  used  for  both  input  and  output) .   Such  a  practice  can  cause 
erroneous  results  if  the  correspondinq  arqument  is  subsequently 
referenced  exoectinq  it  to  have  its  oriqinal  value.   Even  if  the 
user  r3alizes  that  the  arqument  has  been  modified,  ^xtra 
conou  ta+- ion s  nay  be  required  to  recalculate  the  oriqinal  value  if 
this  value  is  subsequently  required. 
Example: 

XL  ind  XR  in  the  parameter  list  at  L1  are  clearly  input 
parameters  since  they  are  referenced  before  they  are 
assiqned.   If  the  values  returned  to  the  callinq  routine 
are  referenced,  the  proqrammer  may  incorrectly  assume  he 
is  referencinq  the  oriqinal  input  values.   If  the 
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returned  values  are  never  referenced  (i.  e.,  the 
parameters  are  not  output  parameters)  proqram  anomalies 
may  occur  when  the  subroutine  is  used  in  a  different 
environment . 


2.7  Anomalies  Involving  The  Value  Of  Variables 


There  ar^  a  number  of  other  miscellaneous  defects  detectable 
at  compile  time  that  can  be  brought  to  the  student's  attention. 
The<=;e  defects  involve  the  execution  time  value  of  variables,  and 
include: 

•  array  subscript  out  of  bounds, 

•  parameter  of  a  DO  loop  <  0  (in  FORTRAN) , 

•  division  by  zero, 

•  testing  a  condition  that  is  uniformly  TRUE  or  FALSE  at  the 

point  of  the  test,  and 

•  detection  of  non-executable  code. 

^hese  and  oth^r  proqram  anomalies  can  be  detected  by  applying  a 
specific  realization  of  invariant  assertion  analysis  as  described 
in  Section  3.5.2.1. 


2.3  Transfer  Variable 


A  variable,  V,  is  a  transfer  variable  if 
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•  the  value  of  an  expression,  X,  is  assiqned  to  V,  and 

•  a*  iach  reference  (normally  only  one)  to  V,  which  contains 
the  value  of  X,  the  defining  components  of  X  have  the  same 
value  as  when  X  was  assiqned  to  V. 

The  reason  for  det^ctinq  such  a  situation  is  that  the 
assiqnaent  of  X  to  V  can  be  eliminated  and  the  expression  X 
substituted  for  cor respondinq  references  to  V.   Of  course,  the 
substitution  for  transfer  variables  at  correspondinq  references 
will  depend  on  +-h»  type  of  the  variable  (and  the  correspondinq 
exor^ssion)  anl  implicit  actions  associated  with  the  assiqnment 
operator.   Althouqh  such  a  substitution  probably  produces  a  more 
efficient  program,  this  is  not  *-he  major  reason  for  brinqinq  this 
to  the  student's  attention.   The  primary  motivation  is  to  help  the 
stu1°nt  understand  how  data  flows  throuqh  his  proqram. 

Examples : 

1)  '(XR)     assiqned    to    YR    at    L3   can    be    substituted    for    YR 
it    LU    (thus,    eliminatinq    L3)  . 

2)  XN    assiqned    to    ROOT    at    L19   can    be    substituted    for    ROOT 
at    L1.      This    eliminates   L19    and    since   no   explicit 
action    must    be    performed    before    returninq,    the   GOTO    30 
at    L6    can    be    replaced    by    RETURN. 

Ther^    arB    a    number   of    proqram   constructions    that  fall    into    the 

abovo    definition    of    transfer    variable    that    should    n.ot  be    brouqht    to 

;    student's   attention.       A    discussion    of    these    forms  is    deferred 
un*il    Clapt^r    6. 
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The  purpose  of  detectinq  transfer  variables  is  to  identify 
code  sequeces  such  as 

B  =  A 

A  =  3  ♦  1 
(where  B  is  not  referenced  further  in  the  program)  and 

QUOT  =  A/B 

REM  =  A  -  QUOT*B 
(where  QDOT  is  not  referenced  further  in  the  proqram).   In  the 
first  code  sequence,  the  student  may  be  reluctant  to  write 
"A  =  A  ♦  1"  because,  when  interpreted  as  an  equation  (which  is  the 
way  many  students  interpret  assignment  statements),  this  implies 
M0  =  1".   In  order  to  avoid  this  assumed  paradox,  he  creates  the 
transfer  variable  3.   In  the  second  code  sequence,  the  student  is 
implementing  the  iod  function.   In  decomposinq  the  operation  into 
separate  steps,  the  student  has  qenerated  the  unneeded  variable 
QUOT. 


2.9  Other  Anomalies 


There  are  a  number  of  other  types  of  proqram  anomalies  not 
addressed  in  this  thesis,  which  include: 

•  ar.orailias  confined  within  a  sinqle  statement,  and 

•  anomalies  detectable  by  local  flow  information. 
Thene  types  of  anomalies  are  no  less  important  than  those 
ronsi dared  in  this  thesis  but  have  been  excluded  because  their 
detection  dor-,  not  require  qlobal  flow  analysis.   The  remainder  oJ 
this  section  presents  examples  of  these  types  of  anomalies. 
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Sincjle   f_tateme_nt    Aaogialies 

Students   often    write   erroneous  arithmetic   expressions. 
Consider    a    lanquaqe,    such    as    FORTRAN,    that    incorporates    integer 
division    within    expressions.       Consider   expressions    of    the   form: 

R1]     A**1./2. 

E2)      (A*A     +    B*B)**(1/2) 

B31  1/2*X 
Such  expressions  seldom  produce  the  results  the  student  expects; 
expression  B1  evaluates  to  A/2,  expression  E2  evaluates  to  the 
constant  1,  and  expression  F3  evaluates  to  the  constant  0. 

These  anomalies  can  be  detected  by  applyinq  expression 
siuolif icat ion  techniques  to  all  expressions  in  the  proqram.   For 
more  detail  about  such  anomalies,  see  [GIL76"]. 

Local  Anomalies 

Several  constructs  can  be  detected  by  traversinq  only  one  edqe 
of  the  flow  qraph.   Consider  the  followinq  code  sequence. 
IF(A.EQ.B)  GOTO  10 
10  PRINT, C 
^he  evaluation  of  the  IF  durinq  execution  does  not  affect  the 
execution  path;  its  loqical  content  is  completely  null.   Statement 
10  is  executed  (wi*h  no  interveninq  action)  independent  of  the 
relationship  of  A  to  B,  and  the  IF  statement  can  be  removed  from 
♦:hQ  proqram  with  no  affect  on  its  execution. 
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A  second  construct,  which  probably  should  not  be  considered  an 
anomaly  but  which  does  make  the  program  more  difficult  to  read, 
deals  with  the  unconditional  transfer  of  control  to  an 
unconditional  transfer  of  control.   Consider  the  code  sequence 
IF(A.EQ.B)  GOTO  20 


20  RETURN 
A  more  understandable  version  of  the  program  would  replace  the 
GOTO  20  with  a  RETURN.   Many  variants  of  this  construct  occur  in 
students'  programs;  th°  RETURN  could  have  been  a  STOP  or  another 
GOTO. 


29 


3    DEVEI0P1ENT    OF    ITERATIVE    GLOBAL    FLOW    TECHNIQUES 

This   chapter    levelops    basic    iterative   qlobal    flow    techniques, 
which    will    be    used    in    Chapter    4    for   detecting    specific    proqram 
anomalies.      Kildall's    alqorithra   and    Kan   6    Ullman's   algorithm    are 
nr-»spn*-.ed    as    backqround,    and    an    improved    form    of    Kam    S    Ullman's 
algorithm    is   developed.      Then    two    specific    frameworks,    property    p 
propagation   and    invariant    assertion    analysis,    which    use    this 
improved    alqorithra    as    their    underlyinq    structural    tool,    are 
developed.      The    improved    alqorithm   and   the    frameworks    based    upon    it 
ar->    applicable   *o    a   wide   variety    of   qlobal   flow   problems. 

The    concept    of   a    flow  .graph    is   the  basic    structure    upon    which 
the    qlobal    flow   techniques    are   based.      The    formal    definition    of   a 
flow    qraph    and    the    nomenclature   associated    with    it   can    be    found    in 
Appendix    T.      It    is   assumed    that    the    source   proqram    beinq    analy2ed 
has   been    transformed    into    its    cor respondinq    flow   qraph (s),    and    this 
thesis    will   reference    the    nodes   of    the    flew   qraph    instead   of    the 
source   statements    of   the    proqram. 


For    simplicity,    each    node   of    the    flow   qraph    can   only 
correspond    to    one    of    the    followinq   elementary    statements: 

•  simple    assiqnment    statements    (i.    e.,    no    imbedded 

assiqnments) , 

•  input/output    statements, 

•  flow    of    control    (conditional   and   unconditional),    and 
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•  invocation  of  subroutines  (and  functions) . 
Ml  hiqh  level  constructs  (such  as  DO  loops,  WHILE  loops, 
IF-thf*J-ELSEs,  etc.)  must  be  transformed  into  their  correspondinq 
low  level  constructs. 


3.1  Framework  And  Assumptions 


The  techniques  developed  in  this  thesis  are  based  on  the 
assumption  that  their  tarqe*  application  is  an  interactive  compiler 
system  in  a  timesharinq  environment.   System  resource  constraints 
are  assumed  to  be  as  follows: 

•  Proqram  Space 

Proqram  space  is  limited.   Proqram  overlays  could  be  used 
♦o  resolve  the  problem  of  a  larqe  proqram,  but  final 
acceptancp  and  use  of  the  system  make  this  an  undesirable 
solution  (at  least  overlays  should  be  kept  to  a  minimum) 
Thui,  aiqorithms  should  be  easily  implemented  and  should 
be  restricted  to  small  amounts  of  proqram  space. 

•  Data  Space 

A    reasonably    ]arq°    maximum    data    space    is   possible,    but 
since    it    is    used    in    a   timesharinq    environmpnt,    the 
system's    use    by    a    larqe    number    of    students    dictates    that 
data    space   sizp    should    also    be    kept    to    a    minimum. 

•  Execution   Time 

The    system's    final    acceptance   and    use    will    depend,    in 
part,    on    what    kind    of    response    time    the   student   can 
expert.       Thus,    system    response    time    to    th<=>    student    shoul 
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b-j  considered  when  desiqninq  the  structure  of  the 

alqorit hms. 

The  techniques  developed  in  this  thesis  are  equally  applicable 
to  a  batch  system;  however,  the  two  environments  are  siqnif icantly 
different.   Consider  the  followinq  comparisons,  which  are  discussed 
In  1  e  t  a  i  1  b^  1  o  w . 


jQ.l2E3.ct  ive  System 
I)    The  student  can  ask  for 

additional  information  if 
preliminary  information  is 
not  sufficient . 


Batch  System 

The  student  must  set  flaqs  (say 

on  a  control  card)  to  indicate 

the  detail  of  information 

desired  before  he  submits  his 

-job. 


IT)   The  student  can 

dynamically  chanqe  his 
proqram  at  any  point 
durinq  the  analysis. 


The  student's  proqram  is  fixed 
(except  for  chanqes  made  by  the 
system  itself) . 


Ill)  The  student  is  sittinq  a+ 
the  terminal  waitinq  for  a 
response  to  his  request. 


The  student  must  wait  throuqh 
three  time  intervals: 

•  system  queue  time, 

•  execution  time,  and 

•  distribution  time. 
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I)  Additional  Information 

Since  in  an  interactive  system  the  student  can  ask  for 
supplementary  information,  only  preliminary  information  should  he 
presented  initially.   Tf  this  preliminary  information  is  sufficien 
for  the  student  to  choose  an  appropriate  action,  then  no  further 
information  need  be  collected.   Otherwise,  the  system  can  be 
directed  to  collect  and  present  more  descriptive  supplementary 
i  nformat ion. 

Such  a  presentation  scheme  dictates  that  the  detection 
alqorithms  should  be  structured  in  the  followinq  way: 

Each  module  executed  in  a  sequence  should  be  capable  of 
collectinq  a  specific  "level"  of  information  (of  interest  to 
the  student)  assuminq  that  all  previous  modules  of  the 
sequence  have  been  executed. 

II)  Proqraro   Modification 

Sine**  the  student  can  modify  his  proqram  at  any  time  durinq 
tha  analysis,  information  which  has  been  collected  but  not  yet 
pr^s^nted  may  be  voided  by  an  editinq  chanqe.   Thus,  analyses 
should  be  delayed  as  lono  as  possible,  but  inevitably  certain 
analyses  will  havf  to  b^  repeated  each  time  the  student  chanqes  hil 
proqram.   This  aqain  dictates  that  alqorithms  should  have  a  modul?: 


struck  ure . 
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[IT)  Pesponse  Time 

In  a  batch  system  the  execution  time  is  normally  a  small 
sroportion  of  the  student*s  total  wait  time.   Thus,  alqorithms  that 
>x->~u*e  for  r°asonably  lonq  periods  before  obtaininq  any  results 
lr 3  acceptable.   Such  an  alqorithm  structure  is  not  acceptable  in 
in  interactive  system. 

In  qeneral,  it  is  not  desirable  for  an  interactive  system  to 
;p^nd  *  larqe  amount  of  time  collectinq  information  (with  no 
3res«ntaticn  to  the  user)  and  then  to  present  a  larqe  amount  of 
Information  to  the  user  (with  essentially  no  further  system  action 
required).   The  user  finds  lonq  time  delays  frustratinq  and  finds 
lassive  amounts  of  information  presented  on  the  screen  intimidatinq 
(M\R7  3"|).   Such  a  system  is  also  inappropriate  for  efficiency 
reasons,  since  the  user  may  make  a  modification  that  nullifies  a 
Larqe  portion  of  the  information  that  has  been  collected. 

Tn  view  of  the  precedinq  comparisons,  it  is  clear  that  the 
ilqorithms  in  in  interactive  environment  should  be  modularized  in 
i\i~h    a  way  that  information  is  collected  at  several  "levels"  -- 
»ach  module  adlressinq  a  specific  "level".   Another  desirable 
ittribute  is  that  the  alqorithms  use  bit  vectors  and  collect 

urination  for  the  whol<=>  source  proqram  at  once  in  parallel;  this 
illows  an  efficient  use  of  bitwise  operations  available  on  most 
iachir.»s.   "any  of  the  alqorithms  developed  in  this  thesis  have 
°  desirable  properties. 
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3.2    Unified   Iterative    Global    Plow    Techniques 


Pecently,    iterative    qlobal   flow    techniques,    capable   of 
addressinq   qeneral   qlobal    flow   problems,    have    been    developed. 
Kiliall's    alqorithm    (rKTL7.3T)    and    Kam    6   Ullman's  approach 
(rK^N76"|)    to   his    alqorithni   are   paraphrased   for  backqround,    and   an 
improved    fcrm    of    Kam   &    Ullman's  alqorithm    is    presented.       All    three 
alqorithms    \r^   cast    within    a    semilattice    framework    to    be    described 
below . 

Tn    order    to    motivate    the    aoplicability   of    the    qeneralized 
qlobal    flow    alqorithms    to    be    presented,    a    specific    qlobal   flow 
problem,    live    variable    analysis,    is   discussed.       Live    variable 
analysis   determines   for  a    specific  variable,    V,    the    reqi on    of    the 
flow    qraph    for    which    there    pxisr.s    a   subsequent   reference    to    the 
current    value   residinq    in    V.       In    ether    words,    qiven    node   n,    does 
there    exist,    a    iescendant,    d,    of   n    that    references   the    current    valu 
of    V?      Such    a    reference   exists    iff    there    exists   a    descendant,    d, 
that    references   V    and    there    exists   a    path    from    n    to   d    that    contain 
no   assiqn    point    of    V    (since    such    an    assiqn    point    destroys   the 
current    value    of    V). 


onr    approach    to   solvinq    this    prohlpm    is    to    "move    backwards" 
(i.    p.,    from    successor    to    predecessor)     throuqh    the    flow    qraph 
markinq    nodps    by    the    followina    rules.        (All    nod^s  are    oriqinally 
uniark^d.) 
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1)  If  no  ie  n  refprences  variable  V,  then  nark  all  predecessors 
of  n. 

2)  If  node  n  is  narked  and  n  is  not  an  assiqn  point  of 
variable  V,  then  mark  all  predecessors  of  n. 

Tha  interprets tior.  is  that  V  is  live  at  those  nodes  that  are 
marked. 

Several  mechanisms  miqht  be  used  to  implement  "movinq"  throuqh 
♦•he  flow  qraph.   On^  is  to  employ  a  stack  to  keep  track  of  nodes 
that  require  further  processinq.   Another  is  to  process  all  nodes 
in  a  predetemined  order  and  to  make  several  passes  over  the  nodes. 
An  obvious  question  is  "When  do  we  stop  applyinq  the  rules?"; 
i.  e . ,  "Wh^n  loes  the  process  converqe?"    The  qeneral  answer  is 
"When  application  of  the  rules  causes  no  new  node  to  be  marked." 

The  iterative  qlobal  flow  techniques  presented  below  represent 
a  q-^neralizat  ion  of  flow  problems  such  as  live  variable  analysis. 
The  techniques  incorporate  a  mechanism  for: 

•  a  q<=neral  information  space  (i .  e.  ,   the  information  to  be 

propagated  throuqh  the  flow  qraph) , 

•  propaqatinq  the  information  throuqh  the  flow  qraph, 

•  transforminq  information  to  correspond  to  internal 

manipulations  within  *he  nodes,  and 

•  determininq  when  the  analysis  has  converqed. 

5ftilattice  Framework 

Let  I  be  a  finite  information  sp_ac_e  (eq.,  the  set  of  bit 
vectors  of  l^nq^h  1)  saMsfyinq  the  followinq  conditions  with 
resDec*-  to  the  "meet"  operation,  ft. 
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•  *    :    T   x    I    ->   T. 

•  For   a,    b,    c    €    T 

•  aHa   =   a 

•  a*b   =    bfta 

•  a*  (bfcc)    =     (aftb)  *c. 


(Closure) 

(Idempotent ) 
(Commu+ati  ve) 
(Associative) 


•    There    exists    a    lattice    zero    element,    0,    such    that 
V    a  e    I,  a    *    0    =    0. 

If   there   exists   a    lattice    one    Q)     element,    then    it   has    the 
property    that 
¥   a   e   I,  a   *    1    -   a. 

The    set    I    and    the    *    operation    define    a    t-semilattice.       The   ft 
operation    defines   a   partial    ordering   on   I: 
a   <    b    iff   a*b   =    a; 
a   <*    b    iff    a    <    band    a    #    b. 


A    function 

f:    N    x    N    x   I   ->    I 
(where   N    is   the   set    of    nodes    of   the   flew    graph    G   =     (N,E,e))    is    a 
f low    function    if    it    satisfies    the    following    homomorj^hisro    2rgp_e]:ty_: 

f(n,s,a*b)    =    f(n,s,a)    *    f(n,s,b)  n,s   e    N,      a,b   6    I. 

Tn  the  following  algorithms,  the  meet  operation  can  be 
inforpretei  as  the  means  of  transmitting  and  combining  informatioi 
that  flows  betwpen  the  nodes  of  the  flew  graph.   A  flow  function, 
f,  reflects  the  internal  manipulation  of  the  information  within 
-i-h  node  of  the  flow  graph.   All  of  the  specific  lattice  spaces 
us^l  wi+hin  this  thesis  contain  a  lattice  one  element,  and. 


1  Thr  underline  indicates  a  lattice  operation. 
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th-»r^fore,    tha    specification    of    the   alqorithms  assume    its 

existence. 

SilJallls  Algorithm 

Tn  Fiqure  3.1: 

•  ft    is   the    flow    qraph    beinq    analyzed. 

•  e    is  the    entry    node    to    ft. 

•  in    is    th--^    input    lattice    information   associated    with    e. 

•  LIST    is   a    list    +hat    "drives"   the   execution    of    the    alqorithm 

(<=    indicates    insertion   and    removal   of   elements    from    the 
LIST.)       The    first    component   of   each    list    element    specifies 
the   nodp    to   which    information    is    beinq    propaqated   and    the 
second    specifies   thp   information. 

•  q.lat   contains    the    lattice    information    associated    with    node 


FLOW1  (G,e,in) 

I°I  £Y.e_r.Y_  q  6  G 
q.lat  <-  1 ; 

LIST    <-     (e,in)  : 
do    while     (LIST    *    null) 
(a, info)     <=    LIST; 
if    -•  (q.lat    £    info)     then 
q.lat   <-    q.lat    ft    info; 
for   evrry.   s   e    succ(q) 

LIST    <=     (s,f  (q,s, q.lat))  ; 
rof ; 

fi; 

od  ; 
END    PL0W1 

Fiqure    3.  1 
SPECIFICATION    OF    KILDALL'S     ALGORITHM 


When    thp    alqorithm   hal+s,    the   desired    flow    information    is 
attached    to   the    nodes   of    the    qraph    in    q.lat. 
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Kildall  has  proven  that  the  algorithm  converges  in  a  finite 
time  to  the  "-»xpected"  result;  i.  e. ,  upon  completion  of  the 
analysis,  the  flow  information,  q.lat,  attached  to  node  q 
represents  the  information  obtained  by  traversing  every  possible 
control  path  from  e  to  q.   He  was  able  to  show  "loose"  bounds  on 
convergence,  dependent  on  the  properties  of  the  finite  semilattice,; 
T. 

Kildall' <s  approach  has  at  least  3  undesirable  properties. 
P1)  The  number  of  nodes  placed  in  LIST  can  be  siqnif icantly  larqe. 

The  "freswheelinq"  form  of  the  alqorithm  allows  a  particular 

node  to  have  man^  instances  of  itself  in  LIST  at  a  qiven  time. 

Thus,  larqe  amounts  of  memory  (and  processinq  time)  can  be 

required. 
P2)  Lattice  information  is  placed  into  LIST  alonq  with  its 

corresponding  node.   Since  a  larqe  number  of  entries  can  exis 

in  LIST,  tie  total  memory  requirements  can  be  larqe. 
P3)  The  "freewheelinq"  nature  of  the  alqorithm  allows  the  followii 

phenomenon  to  occur: 

The  lattice  information  on  the  nodes  in  a  subqraph  of  G 
iniy  "stabilize."   New  information  is  then  introduced  (b' 
considering  a  larqer  subqraph)  which  "perturbs"  the 
previously  stable  situation.   The  alqorithm  must  then 
davot.c  more  resources  to  restabilizi  nq  the  information  i 
the  original  subgraph. 

Since  this  can  orcur  again  and  again  each  time  the  "scope  of 
ittention"  is  ^xpand^d,  the  algorithm  has  a  convergence  bound  of  ct 
least  0(n**2),  whore  n  is  the  number  of  nested  subgraphs. 
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Kfll   £    Hllmag's    Alagtithm 

Kam    £    llllman's  approach    eliminates   these    undesirable 
properties.      Th<=>y    avoid    the    "f reewheelinq"    property   of    Kildall's 
algorithm    by   orderinq    the    nodes   of    the    flow    qraph    in    a    "natural" 
way    (reverse    postorder   as    determined    by   a    depth    first    spanninq 
tre?)    and    processinq    the    nodes    in    this   order    usinq    multiple    passes. 
Since    the    processinq   order    is    determined     (and    all    nodes   are 
processed    durinq    each    pass),    the    LIST    is    not    necessary    to   "drive" 
the    alqorithm.       When   a    particular    node    is    beinq    processed,    lattice 
information   is   collected    from   all    of   its    predecessors    and   processed 
collectively. 


DFST(G,e) 

for    every,    q    €    G 

q. visit    <-    'O'B; 
rof ; 

1    <-    |G|;    /*    number   of    nodes    in   G    */ 
e. visit    <-    *  1 • B; 
STK    <-    e; 

iQ.    while    (STK    #    null) 
q    <=    STK; 

if  f  there  exists  s  €   succ(q)  such  that  -»s.  visit]  then. 
STK  <=  q; 
s. visit  <-  •  1'B; 
STK  <=  s; 
else 

q . rpord  <-    1 ; 
i  <-  1  -  1; 

fi; 

od  ; 
END  DPST 

Fiqure  3.2 
SPECIFICATION  OF  DEPTH  PIRST  SPANNING  TREE  ALGORITHM 
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DFST 

The  depth  first  spanninq  tree  (DFST)  algorithm  in  Figure  3.2 
is  a  restricted  form  of  the  general  algorithm.   Since  only  the 
order  of  the  nodes  is  desired,  the  actual  spanning  tree  is  not 
generated.   Drier  is  indicated  by  associating  a  "sequence  number," 
g.rpord  (Reverse  Post  ORDer) ,  with  each  node.   An  array  (denoted  by 
qMl)  of  pointers  to  the  nodes,  which  encodes  this  ordering,  will 
also  be  maintained  so  that  nodes  can  be  guickly  accessed  in  the 
desired  order. 


Let  T  =  (N,E»)  be  a  particular  DFST  for  G  =  (N,E,e) .   An  edge 
of  5  falls  into  one  of  three  classes: 

11  a  forward  edjje,  which  goes  from  a  node  to  a  descendant  in 
T, 

2)  a  back  ed.ge,  which  goes  from  a  node  to  an  ancestor 
(including  itself)  in  T,  or 

3)  a  cross^edge,  which  goes  from  a  node  to  a  second  node  not 
related  as  an  ancestor  or  a  descendant  in  T. 

The  following  discussion  can  be  found  in  fKA!l76"|. 

Observation  3.1:  Let.  T  be  a  DFST  of  G,  and  a,b  e  G.   Then  (a,b)  is 
a  back  edge  iff  a.rpord  <  b.rpord. 


Observation  3.2:  Let  T  be  a  DFST  of  G.   Then  every  cycle  in  G 
contains  at  l^a.st  one  back  edge. 


L«t  T  b^>  a  particular  DFST  of  G.   Let  d(Gf  T)  .  the  loop, 
connectedness  of  G  with  respect  to  T,  be  defined  as  the  largest 
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number   of    back    edqes    found    in   any    cycle-free    path    in   G.      This    value 
will    be    used    to   define    an    upper   bound    on    the    converqence   of    Kam    & 
Oilman's    algorithm.       It  can    be    shown    ([HEC7U])    that    d(G,T)    is 
independent    of    the    particular    DPST   T    chosen    if   G    is   reducible. 


FLPW2  (G,*,in) 

f°.E   £Y.§II    q   F    G 
q. lit    <-    J ; 

e. lat    <-    in ; 
chanqed    <-    • 1 ' B; 
do    while     (chanqed    =    M'B) 
chanqed    <-    ' P' B; 
for    1    <-    1    to    |G| 
temp   <-    J.; 

f°.£  §Y2I2    P   e    pred(qfjl) 
STEP1  :  temp    <-    tenp   *    f  (p,  q[  1],  p.  lat)  ; 

if    temp    #    qfjT.lat    then 
q[  1 1. lat    <-    temp ; 
chanqed   <-    • 1 »B; 

fi: 
rof ; 
g  d ; 
END    FL0W2 

Fiqure    3. 3 
SPECIFICATION    OF    KAM    &    ULLHAN'S     ALGORITHM 


Alqor  ithm 


Kam    5    Ullman's   alqorithm    (Fiqure    3.3)    also   has    some 

undesirable   propertips. 

PU)    Th<=r»    will    normally    exist    nodes   for    which    no    predecessor 

prDpaqat=»s  "new"  information.  Since  all  nodes  of  the  flow 
qraph  are  procpssed,  these  nodes  are  processed  even  thouqh 
*he>ir    lattice    information    will,    a    priori,    not    chanqe. 


<*2 


PS)  The  meet  operation  is  performed  across  all  predecessors  to 

collect  lattice  information  at  the  node,  n,  beinq  processed. 
However,  some  of  these  predecessors  may  not  propagate  any 
"new"  information;  such  predecessors  need  not  be  included  in 
the  meet  operation. 

PM  Assume  f  is  dependent  on  only  the  source  of  the  edqe  (n,s),  as 
is  usually  the  case.   In  such  a  situation,  node  p  (at  STEP1) 
may  have  several  successors,  and  f(p,x,p.lat)  has  the  same 
value  for  each  value  of  x.   The  transfer  of  lattice 
information  throuqh  f  can  be  handled  in  one  of  two  ways: 

•  *    (at  a  particular  node,  n)  can  be  evaluated  for  each 

successor  (of  n) ,  or 

•  f  can  be  evaluated  once  and  stored  (with  node  n)  for 

use  later  by  each  successor. 
Both  are  undesirable  —  the  first  because  of  execution  time, 
and  th?  second  because  of  unnecessary  memory  usaqe. 

Converqp  nee 

Alqorithm  PL0W2,  in  fact,  converqes  in  a  finite  time 
independent  of  thp  ordprinq  of  the  nodes.   The  choice  of  processin 
►h^  nodes  in  reverse  postorder  represents  a  "natural"  orderinq  and 
insures  rapid  convprqencp  under  the  condition  stated  below. 

Observation  3.3  [K\^lf*'\:    Alqorithm  FL0W2  converqes  in  at  most 
d(l,r)  ♦  2  iterations  of  the  alqorithm  if  condition  (*)  holds: 


(*)   (V  n,S  6  N)  (V  x  €  r)  (f  (n,s,x)  >  x  ft  f  (n,s,l)  )  . 
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A    method    that    combines    the   desirable    properties    of    Kildall's 
alqorithm    and    Kaa   5    Ullman's    alqorithm    is    presented    below.      This 
improved    alqorithm    directs    its    local    attention    (as   done    in 
Kildall's   alqorithm)     to   those    areas   of   the   flow   graph    to    which 
"new"    inforaation    is   beinq    propaqated    and    restricts    its    qlobal 
attention     (as    done    in    Kan    &    Oilman's   alqorithm)    in   such    a   way    that 
local    stabilization    does   not    cause   "thrashinq." 

Ij-Piov^d    Algorithm 

Tho    improved    alqoritha    in   Fiqure    3.4    is    very   similar   to   that 
of    Fiqar^    3.3.       As    in    Kam    &    nllman's    approach,    the    DFST    alqorithm 
is    oerfornel    first    to   determine    the    "natural"    orderinq;    then    FL0W3 
is   applied    to   accomplish    the    qlobal   flow    analysis. 


PL0H3  (G,^,in) 

£<2£   fv^ry    q   6    G 

q.lat    <-    J;    q.mark    <-    M«B; 
rof ; 

e.lat    <-    in; 

do    while    (r  q.mark    =    M'B    for    some    q   e    Gl) 
for    i    <-    1    to    |G| 

if    qfH-BJark    tjien 

q[  11. mark    <-   T0 • B; 
for   every   s   e   succ(qfH) 
STEP1:  t    <-    f  (qT  1  l,s,q[  1  ].lat)    *    s.  lat; 

if   t    #    s.lat    then 
s.lat    <-    t; 
s.  mark    <-    M'B; 

til 

rof; 


fij 


rof; 
od  ; 
END    FLTH3 


Fiqure    3.4 
SPECIFICATION    OF    IHPPOVFD    ALGORITHM 
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FL0W3    us  33   a    set    of    bits    to    "mark"    each    node    to   which    "new" 
information   has    been    propagated.       Only    those    nodes    that    are 
"marked"    are    processed.      let    g    be   the    node   currently   beinq 
processed.       As    information    is    propagated    through    a    given    edge, 
(g,s) ,    nole   s   is    marked   if    "new"    information    is   propagated    to    it. 
If   <7.rpori*    <    s.rpord,    then    the   edge     (g,s)    is   either    a    forward   edqe 
or   a    cross-edge;    if    s    is   "marked"    it    will    be    subsequently    processed 
in    the    current    pass.      If    g.rpord    >    s.rpord,    then    the   edge    (g,s)    is 
a    back    edge;    if   s    is   "marked"    it    will    be    processed    in    the   next    pass 
(s    will    not  be    processed   again    in    the    current    pass   since    g.rpord    > 
s. rDor  1)  . 


This    improved    algorithm    allows   information  to    be    propagated 
through    the   flow    graph    but    does   not   allow   any    node   of    a    cycle    to   h<= 
processed    mora    than   once    within   the   same    pass.       Regions    of    the    flo* 
graph    in    which    lattice    information    has   stabilized    will    not    be 
processed    because    the    nodes   of    these   regions   will    not    be    "marked." 
Notice   that   in    the    processing    of    a   node,    g[  i  ]■,    lattice    information 
is    immediately    "pushed    forward"   to   all   successors    in   contrast    to 
Kim    P.    Tllman's    approach   which    "pulls"    information    from    all 


predecessors.       It    is   this    feature    which   eliminates    undesirable 
properties    D5    and    P*    of    FL0W2. 

Tf    f    is    1epend<=>nt    only    on   gf  i  1   and    not.    on    s,    then    the 
statemsrt   at    STFP 1    can    he    moved   outside   the    for    loop. 


r*-r,iLl    thai-    this    indicates    the    reverse    post    order   of    the    node    as 
determine!    by    a    DFST. 
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Converqpnce 

It  will  be  shown  that  the  iaproved  algoritha  converges  in  at 
most  d (G,T)  ♦  1  passes  ovpt  the  nodes  of  the  flow  graph  if 
condition  (*)  is  satisfied.   Before  this  can  be  done,  however,  soae 
notation  Bust  be  developed. 

Let  PATH (h)  be  the  set  of  paths  froa  node  1  to  node  h, 
PATHrn-t  (h)  =  fp  |  p  e  PATH(h)  and  has  at  most  n-1  back  edges},  and 
BPA^Hrn-t  (h)  =  {p  |  p  e  PATH  (h)  and  has  at  most  n  back  edges  and  the 
last  edge  is  a  back  edge}.   Let  F  =  (f(n,s,»)  |  (n,s)  e  E}*;  i.  e., 
the  transitive  closer  (under  composition)  of  all  possible  flow 
functions  attached  to  nodes  of  the  flow  graph.   Given  a  path  g,  let 
f  i-g-,  (x)  be  th«  composition  of  the  functions  encountered  along  path 
g;  eg.,  if  g  =  (r,s,t),  then  f  rgi  (x)  -    f (s,t , f  (r ,s, x) )  . 

Upon  inspection  of  the  improved  algorithm,  it  is  clear  that  on 
the  completion  of  a  specific  pass  of  the  algorithm,  the  only  nodes 
that  can  possibly  be  "Barked"  are  the  tails3  of  the  back  edges. 
This  is  true  because  as  a  node,  n,  is  processed,  its  "mark"  bit  is 
turned  off,  and  the  only  wav  that  it  can  be  turned  back  on  is  by 
the  processing  of  a  subseguent  node.   But  if  a  subseguent  node 
transmits  n»w  information  to  node  n,  the  edge  through  which  the 
information  was  transmitted  must  be  a  back  edge  (by  observation 
3.1).   Thus,  in  deteraining  the  convergence,  only  the  set  of  nodes 

H  =  fh  |  (m,h)  €    E  and  (m,h)  is  a  back  edge} 
need  h^  considered. 


3  ^iv^n  an  edge  (a,b),  a  is  its  head  and  b  is  its  tall. 
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The  proof  that  the  improved  alqorithm  converges  in  at  most 
3(G,T)  ♦  1  iterations  is  presented  in  three  parts.   Lemma  3.1 
states  the  information  that  is  attached  to  each  node  upon 
completion  of  each  pass  of  the  alqoritha.   Lemma  3.2  states  a 
sufficient  condition  for  converqence,  and  theorem  3.1  shows  that  if 
condition  (*)  holds,  then  the  sufficient  condition  of  lemma  3.2  is 
satisfied  on  completion  of  pass  d(G,T)  ♦  1  of  the  algorithm.   A 
variation  of  condition  (*)  is  actually  used  in  the  proof  of  theorea 
3.1. 


Observation  3.4  (rKA«76  1) 

Con-lit  ion 

(*)       (¥  f  e  F)  (v  x  e  T)  (f(x)  i   x  ft  f(D) 

is   equivalent    to   condition 

(**>  (V    f,q    6    F)      (V    x,y    €    T)      (fq(y)     >    q(y)     ft    f(x)     ft    x) 

Proof:    See   f  KAS761. 


L^miia    3.1 

on    completion    of    th<=»   n-th    iteration   of    the  improved    alqorithm 

q.lat    =     (*(f  rq-.  (in))  ,qePATHrn-,  (q)  )     if    -.  (q    6    H)  ,    and 

Q.lat    =     (ft(f  rq-i  (in))  ,qePA.TFrn-,  (q)  )     ft 

(ft  (f  rPi  (in)  )  ,peBPATHrnn  (q)  )     if    q    €    H. 
(Recall    that    "in"    is    the   input    lattice    information   attached    to   the 
~ntrv    noi  <j . ) 


fc 


but 


Proof : 

Thin  lemma  can  be  proven  by  induction  on  n,  the  number  of 
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if  -rations.   Its  proof  is  somewhat  detailed  and  the  content  of  the 

leama  follows  intuitively  from  the  execution  of  the  alqorithm,  so 

the  proof  is  omitted.   (See  the  proof  of  lemma  2  in  fKAM761  for  a 
similar  proof.) 

L  ^ama  1 .  2 

The   improved   alqoritha   halts    after   no    more    than   n    iterations   if    for 

all    heH    and    all    p   6    BPATHrnT (h)     there   exists   Q   C    PATH,-^  (h)    such 

that 

frPi    ^    (*(frqi  Un)),qeQ)  . 

Proof: 

Select    a    qiven    h    e    H.       Since   +he    number   of   paths   in    BPATHrni (h)    and 

PATHrn-,  (h)    is    countable, 

A  =  (A  (f  rt>i  (in)  )  fPeBPATHr-TiT  (h)  )  *  (*  (f  rq-,  (in)  )  ,qePATHriH  (h)  )  =  B. 

Durinq  the  n-th  iteration  of  the  alqorithm,  B  is  the  information 

attached  +o   node  h  lust  after  it  has  been  processed,  and  A  is  the 

information  transmitted  throuqh  the  back  edqes  for  which  h  is  a 

tail.   Upon  completion  of  the  n-th  iteration,  A  A  B  =  B  is  attached 

to  the  node  and  thus  h  will  not  have  been  "marked."   In  summary, 

for  every  h  6  H,  h  is  not  "marked"  upon  completion  of  the  n-th 

iterations  and  the  alqorithm  halts. 

Theorem  3 . 1 

The    improved   alqoritha   halts    after    at    most    d(G,T)    ♦    1    iterations   if 

condition     (**)     holds: 

(**>  (*    f,q    6    F)     (?    x,y   6    I)     (fq(y)     >   q(y)    *   f(x)    A    X). 
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Proof : 

By    lemma    3.2,    it    suffices    to   show    that    for   all    h    6    H    and    all    p  e 

BPATHrd+1-,  (h)  ,    there   exists    Q   C    PATHrd  +  1n(h)     such   that 

f  rP-i  (in)     >    (*(fr<H  (in))  ,qeQ)  . 
(Just    lot    n  =    i    ♦    1    in    l^mma    3.2.) 

This    can    be    proven    by    induction   on    K,    the    number    of    back   edqes    in 
P    =     (ir1-i#     1r2-,,     ...,     ii-r-i),     ir1i     =    1/    irri    =    h. 
Basis    step  (0    <    K    <    d) 
Just    let    Q    =     fp}     C    PATHrd  +  1^  (h)  . 
Induction    step(K    >    d) 

Since  p  contains  more  than  d  back  edqes,  it  cannot  be  cycle  free  b 
the  definition  of  d.  Pick  the  hiqhest  number,  a,  such  that  i rai  = 
i  rb-,  for  sotie  b  >  a.  Lot  p1  =  (jrlT,  ...,  i  ra-i )  ,  p2  =  (ira-i,  ..., 
irb-i),  p3  =  (irb1#  .../  i  r-r-,  )  ,  and  let  pU  be  a  path  from  node  1  to 
node  ii-ai  that  contains  no  back  edqes.  (Such  a  path,  pU ,  exists 
for  any  q  6  G  --  "just  follow  the  edqes  of  the  DFST,  T.  )  Path  p1 
must  contain  at  least  one  back  edqe  since  ("j  ra  +  1-i  #  •••»  ir-r-i  )  is 
cycle  free  ani  thus  has  at  most  d  back  edqes.  Path  p2  contains  a1 
least  onp  back  odqe  sin^o  it  contains  a  cycle  (by  observation  3.2) 
Let  x  =  f  rP^i  (in)  . 
Th<=n 

f  rP-i  (in)       =    f  rp3-i  (f  ro2-,  (f  rp1-,  (in)  ) )  (by    definition) 

1   f  rP3i  (f  rPl-i  (in)    ft    frp2-,(x)    ft    x)       (by    (**)) 
-  f  rP*i  (f  rPli(in)     ft    f  rp2-,  (f  rp*h  (in)  )    ft    frP^(in)) 
=    f  rp3<|f  rp1i  (in)     ft    f  rP3Tfrp21frP^i  (in)     ft    f  rP^  £  rP<H  (in) 
=   frP'i(in)     ft   frp"i(in)    ft    frp",1(in)l 
ifberp    p«    =    (irii,    ...»    ira-.,    irb*ii,    ...,    irr-,),    p«  •    =    (p4,   lra*l'i 

•••»     1r*>i*     1rb*1l#     ...,     irr-,),    and    p»  •  •    =     (pU,    1  rh*  1-)  ,     ...,     i rri ) 
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Each  of  p»,  p'',  and  p111  are  paths  in  G  and  contain  at  lost  K  -  1 
back  elqps,  and  the  induction  step  follows. 

Theorem  ).1  states  that  the  improved  alqoritha  halts  after  at 
most  i(GrT)  ♦  1  iterations.   Theorem  3.2  (below)  states  that  when 
th«  alqorithm  halts,  the  lattice  information  attached  to  node  q 
represents  the  information  obtained  alonq  eveo  path  from  the  entry 
node  to  node  q,  and  is  a  direct  consequence  of  the  work  done  by 
Kildall. 


Theorem  3. 2 

When  the  improved  alqorithm  halts 

q.lat  =  (A(f  rPi  (in))  ,pePATH(q)) 


V  q  €  G. 


Proof:  See  ("KIL73]. 


3.2.1  Property  P  Propagation 


The  application  of  these  iterative  techniques  usinq  a  variety 
of  different  semilattice  spaces  is  a  powerful  tool,  but  an 
arbitrary  semilattice  structure  can  be  very  complex  and  the  meet 
operation  and  flow  function  may  not  be  trivial.   Thus,  the  price 
pail  for  i mplementinq  such  techniques  utilizinq  an  arbitrary 
s°ailattice  may  be: 

•  larq^  and  complex  data  structures. 
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•  complicated  proqramminq,  and 

•  lonq  execution  time. 

On  the  other  hand,  many  of  the  classical  qlobal  flow  analyses 
(live  variable,  common  subexpression  detection,  uninitialized 
variable,  e*-c.)  can  be  implemented  within  this  iterative  framework 
by  utilizing  a  very  simple  semilattice  space  (bit  vectors),  meet 
operation  (bit  AND  or  OR) ,  and  flow  function.   Two  special  cases 
will  be  developed  in  this  section.   In  essence,  the  algorithms 
iev^lop^d  here  allow  bit  information  to  propagate  through  the  flow 
graph  in  a  specific  way  to  produce  the  desired  results. 

-if-B^Lal  Ziamework 

Assume  there   exists   an    arbitrary    property   P    of    interest.      Some 
nobles    of    the    flow    graph    are   sources    of    property   p   and    some   are 
£in!s§    °f    prop^r+y    P.       Hp   are    interested    in   determining    the    set   of 
nod»s    to    which    property   p    is    actually,    propagated    within    a    specific 
flow    graph . 

A    node   n    obtains    attribute   A,    the   "completion"    of    property   P, 
dep^nd^nt    on    how    property    P    is    propagated    to    node    n.       A    node    n 
proDaqatPs    property   P    to   all    its    successors    iff    either    n    has 
attribute    A    (which    depends   on    property   p)     and    n    is   not    a    sink    of 
property    P    or    n    is   a    source    of    property   P.       The  two    methods    of 
obtiininq    attribute    A    that    are    of    interest    here   are: 

ALI)     A   nolp    n    obtains   attribute    A    iff   every    predecessor    of    n 
propaqates    property    P    to    it. 
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ANY)  A  node  n  obtains  attribute  A  iff  there  exists  at  least 
one  predecessor  of  n  which  propaqates  property  P  to  it. 

An  intuitive  idea  of  the  use  of  this  framework  can  be  obtained 
from  the  followinq  analoqy  involvinq  fluid  flow: 

Let  property  P  correspond  to  a  fluid  which  emanates  from 
sources  of  P  and  is  absorbed  at  sinks  of  P.   The  fluid  can 
flow  only  in  the  direction  of  the  edqes  and  can  activate  a 
node  (i.  e.,  enter  the  node  and  be  propaqated  to  all 
successors  of  the  node)  only  under  certain  conditions,  such 
as: 

•  fluid  must  be  available  from  at  least  one  predecessor, 
or 

•  fluid  must  be  available  from  all  predecessors, 
one  way  of  characterizinq  such  a  fluid  flow  is  to  specify 
which  nodes  have  been  activated  (i.  e. ,  have  obtained 
attribute  A)  . 


Example: 

Let  property  P  be  "node  n  has  been  encountered."   The 
only  source  of  property  p  is  node  n,  and  there  are  no 
sinks  of  property  P.   If  method  ANY  is  applied,  then 
attribute  A  is  "n  is  an  ancestor  of  this  node."   If  ALL 
is  applied,  then  attribute  A  is  "n  dominates  this  node." 

The  example  suqqests  several  qeneralizat  ions.  First,  we  are 
usually  interested  in  a  set  of  properties,  (Pri-i}«  For  instance, 
we  are  normally  interested  in  the  ancestral  or  dominance  relation 
°f  §l11   f  he  noles  —  not  lust  a  specific  node  n.   This  can  be 
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accomplished    by   considerinq   an    M-tuple    of    properties   instead    of 
■just    a    sinqle    property.      Second,    we   may   wish    to    propaqate 
information   from    successors   to   predecessors    (instead   of    vice 
varsa) .      For    example,    we    may    be    interested    in    the   descendant 
relation    or   post    dominance. 


Specialized    Forms 

PROP-ALL  (Fiqure  3.5)  implements  method  ALL.  The  form  of  the 
alqorithm  is  the  same  as  that  of  FLOW!  where  the  information  space 
(M-tuple),  meet  operation  (AND),  and  flow  functions  are  explicitly 
specified.  Since  the  flow  function  is  dependent  on  only  one  node, 
its  evaluation  (assigned  to  t1)  has  been  moved  outside  the  inner 
for   loop. 

In    PROP-ALL    (Fiqure    3.5): 

•  G    is   the    flow   qraph    to    be    analyzed. 

•  e    is   the    entry     (or    exit)    node    of   G. 

•  in    is    the    lattice   information    (an    M-tuple)    to    be    attached    t 

the   entry    node. 

•  X    can    take    on    one   of    two    values,    "pred"   and    "succ."      It 

indicates  whether  the  alqorithm  "moves"  forward  or  backwar 
throuqh  the  flow  qraph.  If  X  =  "succ",  then  e  is  the  entr 
node  of  G.  If  X  =  "pred",  then  e  is  the  exit  node  of  G. 
If  G  has  more  than  one  exit  node,  a  new  node,  q,  is 
allocated  and  all  exit  nodes  are  made  predecessors  of  q; 
thus,    q    becomes    the    roy    unique    exit    node. 


oil 


U: 


!or 


'It 
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PROP-ALL  (G,e,in,  X) 
for    every,    q   £    G 

q.lat    <-    M11    ...     1'B;    /*    fl-tuple    */ 
q. nark    <-    • 1 'B; 

e . lat    <-    in ; 

do    while     (r  q.  mark    =    '1'B    for    some    q   e    Gl) 
for    1    <-    1    to    |G| 

if    q[  1 1.  mark    t  hen 

qr  n.mark    <-    70»B; 

t1    <-    qni.so   |     (qr  11.  lat    6     (-qMl.si))  ; 
I°.I   e.l£LI    *   e    X-cessor  (q[  1  ]) 
'  t2   <-    tl    6    x. lat; 
if    t2    #   x.lat    then 
x.  la*-    <-   t2; 
x.  mark    <-    M'B; 

fi: 
rqf ; 

fi? 
pof : 

od  ; 

END    PROP-ALL 

Fiqure    3.5 
SPECIFICATION    OF    ALL 


•    Each   of    q.la*,    q.so    and    q. si    is   an    M-tuple,    where    n    is    the 
number    of    properties    beinq   considered. 

•  q.so    indicates    whether    node   q    is   a    source    of    the 

properties. 

•  q.si    indicates    whether    node    q    is    a   sink    of    the 

properties. 

•  q.lat    is   the    lattice    information    beinq   determined    by 

the    analysis,    and    corresponds   to   attribute    A. 

PROP-SOME    (Fiqure    3.6)     implements   method    ANY.       Note    that    the 
only    differences    between    PRop-SOflE   and    PROP-ALL   are    the 
initialization    of    the    lattice    information    and    the    meet    operation 
(AND    vs.    OR). 

Even    thouqh    these    alqorithms    are    simple,    they    are    capable   of 
Derforninq    most    of    the   classical    optimization    analyses.       Note    that 
althouqh    optimization    analyses    are    normally   performed    on 


PROP-SOME(G,e,in,X) 
for  every,   q  e    G 

q7lat    <-    '000    ...    0»B;    /*    M-tuple    */ 

q. mark    <-    • 1 *E ; 

e.lat    <-    in; 

do   while    (fq.mark    =    M'B    for   some   q   6   G 1) 
for    i   <-    1    to    |G| 

if   Qr 11* nark    then 

q[  11. mark    <-    «0«  B; 

t.1    <-    qm.so    |      (q[il.lat.    6     (-q[1l.si>)  ; 
fo£  every   x   e   X-cessor  (q[  1 1) 
'  t.2  <-    t1    I    x.  lat; 
if   t?    #   x. lat    then 
x. 1  at    <-  t  2; 
x.mark    <-    M'B; 

fi: 
rof ; 

fi; 

rof  ; 
g  d ; 
END     PROP-SOME 

Fiqure    3.6 
SPECIFICATION    OF    ANY 
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"associated"  properties  (such  as  the  dominance  relationship  between 
all  *  he  nodes),  these  alqorithms  are  qeneral  enouqh  that  the 
properties  beinq  analyzed  need  not  have  any  relationship  to  one 
another.   As  the  number  of  properties  addressed  within  a  sinqle 
analysis  is  increased  (i.  e.,  as  M  is  increased),  the  number  of 
nodes  processed  durinq  the  analysis  may  increase,  but  the 
d(G,T)  ♦  1  converqence  bound  still  holds. 

Lattice  £rop.erti_es  And  Convergence 

The  information  space  of  bit  vectors  has  the  required 
associative  and  commutative  properties  of  a  lattice  space  under  th< 
meet  operations  of  bitwise  AND  and  bitwise  OP.   It  is  easily 
verified  that  the  flow  function 

f(q.»,x)  =  q.so  |  (x  6  -«(q.si)) 
i  •  !  in  PP0P_\LI.  and  PPOP_SOME  has  the  required  homomorphism 
property.   Thus,  these  analyse^  arp  quaranteed  to  converqe  in  a 
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finite  tine.   These  particular  frameworks  also  satisfy  condition 
(*) ,  so  the  analysis  is  quaranteed  to  converge  in  d(GfT)  ♦  1 
iterations  of  the  alqorithm.   Table  3. 1  verifies  that  condition  (*) 
holls  for  PROP_ALL.   A  similar  table  can  be  constructed  for 

PROP  SOME. 


f(x)   fQ)   xafQ) 


SI 

so 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

1 

1 

0 

1 

1 

0 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

1 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Table  3. 1 
VERIFICATION  OF  (*)  FOR  PROP_ALL 


PROP_ALL  and  PROP_SOP"E  process  the  nodes  in  a  predetermined 
orler;  i.  e.,  reverse  postorder  as  determined  by  a  specific  DFST. 
When  X  =  "succ",  this  DFST  is  qenerated  from  the  original  flow 
qraDh,  R.   Wh^n  X  =  "pred",  this  DFST  is  qenerated  from  the 
inverted  flow  qraph,  0';  i.  e.,  the  qraph  obtained  by  reversinq  the 
direction  cf  the  edqes  of  G  and  makinq  the  exit  node  of  G  the  entry 
node  of  G« . 

The  next  four  subsections  present  specific  applications  of 
propprty  P  propaqation. 
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3.2.1.1  Live  Variables 

The  purpose  of  a  live  variable  analysis  is  to  determine  those 
reqions  of  the  proqram  in  which  a  specific  variable,  V,  has  a 
subsequent  reference  to  the  current  data  stored  in  V.   A  variable  V 
is  Ii¥2  at  a  particular  node  of  the  flow  qraph  if  there  exists  a 
reference  to  the  current  value  of  V  at  some  descendant  node. 


Apply  PFOP_SOME  with 

in:  the  vector  of  all  zeros 

Property  P:  "variable  V  has  been  referenced" 

Sources  of  P:  nodes  at  which  V  is  referenced 

Sinks  of  P:  nodes  at  which  V  is  assiqned 

X:  "pred" 

Attribute   A:    "V    is    live" 


Analysis  3.  1 
LIVE  VARIABLES 


Analysis    3.1    presents    an    outline    for    determininq    the   reqion   o 
♦-.he    flow    qraph    in    which    variables   are    live.      Live    variable    analys: 
will    he   discussed    in    somp    detail    in   order   to    present   a    concrete 
»xiOJDle    cf    pr>p^rty   P    propaqaticn    and    to    provide   a    comparison 
betwpon    Kam   S    Ullman's   alqorithm    and    the    improved   alqorithm    withil 
tha    context   of    a    specific    flow    qraph    and    lattice    space. 
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Fiqure  3.7 
FLOW  GRAPH  OF  PROGRAM  IN  FIGURE  2.1 


1100001110 


The  flow  graph  shown  ir.  Fiqure  3.7  corresponds  to  the  program 
in  riqur^  2.1.   Sinks  and  sources  are  shown  at  the  right;  the 
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correspondence  between  positions  in  the  bit  vectors  and  variables 
of  the  proqram  are,  from  left  to  right:  XL,  XR#  XM#  YL ,  YR,  YM, 
EPS,  DELTA,  FOOT,  ITER.   Note  that  the  exit  node  (L20)  is  both  a 
sink  and  source  for  all  variables  in  the  parameter  list;  this  is 
only  an  assumption  and  can  be  modified  if  more  information  is  known 
about  the  context  (s)  in  which  the  subroutine  is  called. 


The  DFST  is  qenerated  from  the  inverted  flow  qraph,  G' ,  of 
that  shown  in  Fiqure  3.7.   The  specific  DFST,  T,  used  in  the 
analysis  is  not  shown,  but  yields  d(G»,T)  =  1  and  a  reverse 
postorder  of:  L20,  L19,  L6,  L5,  L17,  L16,  L1U,  L13,  L12,  L11,  L10, 
L9,  L8,  L7 ,  LU,  L3,  12.   The  only  back  edqe  is  the  edqe  between  L7 
and  L*S. 
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Table  3.2  shows  the  lattice  space  conf iquration  after  the 
completion  of  ^ach  pass  of  PFOP_S0HE  upon  application  of  Analysis 
J.I.   Mote  that  Pass  0  corresponds  to  the  initialization  of  the 
analysis.   Vol^s  of  the  flow  qraph  are  processed  in  the  order  shown 
at  the  left.   Durinq  Pass  1  every  node  is  processed.   Upon 
completion  of  Pass  1,  node  L6  is  the  only  node  to  which  new 
inf->rmation  his  been  propaqated  after  processinq  the  node.   When  L6 
is  processed,  only  node  1.19  has  propaqated  information  (1110001100) 
to  L6.   Subsequent  to  the  processinq  of  L6,  node  L7  propaqates  new 
information  (1101001001)  to  Lf>.   Durinq  Pass  2  only  nine  nodes  are 
processed  (indicated  by  the  *) .   The  analysis  converqes  durinq  the 
second  pass,  indicated  by  the  fact  that  "mark"  is  0  for  all  nodes. 

A  comparison  of  Kam  S  Oilman's  alqorithm  with  the  improved 
alqorithm  yields  the  followinq  results.   The  improved  form 
processes  2ft  nodes  in  two  passes.   If  Kan  S  Ullman's  alqorithm  had 
been  applied,  it  would  have  processed  a  total  of  51  nodes  in  three 
passes.   Thus,  in  this  specific  example,  the  improved  alqorithm 
converqes  twice  as  fast  as  Kam  6  Ullman's  alqorithm. 

No*^  *hat  YR  is  dead  at  L17.  This  fact  plus  the  fact  that  1.17 
is  an  assiqn  point  of  YR  means  that  the  value  of  YR  assiqned  at  L17 
constitutes  unreferenced  da«-a  (see  Section  2.1). 


3.2.1.2  Common  Subexpressions 

The  purpose  of  common  subexpression  analysis  is  to  determine 
those  expressions  for  which  the  current  value  is  still  "available" 
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from  a  previous  calculation.   If  such  an  expression  is  detected, 
subsequent  calculations  of  the  expression  can  be  eliminated  and  the 
"available"  value  used  instead.   There  are  two  situations  which 


Apply  PFOP_ALL  with 

in:  the  vector  of  all  zeros 

Property  P:  "expression  X  has  been  computed" 

Sources  of  P:  nodes  at  which  X  is  computed 

Sinks  of  P:  nodes  at  which  defining  variables  of  X 
are  assigned 

X:  "succ" 

Attribute  A:  "X  is  recalculable" 


Analysis  3.2 
COMMON  SUBEXPRESSIONS 


must  be  satisfied  for  such  a  redundant  calculation  to  exist: 

1)  each  predecessor  path  must  have  a  compute  point  of  the 
expression,  X  (eg.,  X  =  A  ♦  B)  ,  and 

2)  ^>ach  of  these  paths  must  be  free  of  assignment  to  any  of 
thp  d3fininq  variables  of  X  (eq.  ,  A  and  B)  . 

*n  expression,  X,  computed  at  node  n  is  recalculable  at  node  m  if 
node  n  dominates  nodo  m  and  all  paths  from  node  n  to  node  m  are 
fro?  of  assign  points  of  defining  variables  of  X.   Before  a  global 
flow  analysis  can  bo  performed,  a  prepass  must  be  performed  to 
d^t^rmin^  subexpressions  within  the  program  that  are,  in  some 
sonne,  eguivalent;  i.  p.,  candidate  cemmon  subexpressions. 
Analysis  3.2  presents  an  outline  for  determining  that  region  of  th 
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flow  qraph  in  which  an  expression,  X,  is  recalculable.   The 
computation  of  an  expression,  X,  can  be  eliminated  at  any  compute 
point  of  X  at  which  X  is  ^calculable. 


3.2.1.3  P-dominance 


P-dominance    is    a    generalization    of   dominance.       Here    we    are 
interested    in    those    nodes    which    are   dominated    by    the   set    of    nodes 
with    property    P.       A    node,    n,    is   P-dom  inated    if   every    path   from    e 
(the    en+ry    nole)     to    n    contains   a    node    which    is   a    source    of    property 
P.       Analysis    3.3    presents   a    scheme    for   determining    those    regions  of 
the    flow    graph    that   are    P-dominated. 


Apply  PROP_ALL  with 

in:  the  vector  of  all  zeros 

Property  P:  property  P 

Sources  of  P:  sources  of  P 

Sinks  of  P:  none 

X:    "succ"    ("pred"    if   we    want   post    P-dominance) 

Attribute    A:    "this    node    is    P-dominated" 


Analysis    3. 3 
F-D0«1INANCE 
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3.2.1.4    Uninitialized    Variables 


A   specific  case   of  P-dominance   occurs  in    the  detection   of 
uninitialized    variables.      A    variable,    V,    at    node    n    is    (partially  or 


Apply    PPOP_ALL    with 

in:    the   vector  of   all  zeros 

Property   P:    "variable  V    has    been  assiqned" 

Sources    of    P:    nodes    at    which    V    is   assiqned 

Sinks    of    P:    none 

X:    "succ" 

Attribute    A:    "variable   V   has   been    initialized" 


Analysis    3.4 
UNINITIALIZED    VARIABLES 


totally)    uninitialized    if    there   exists  a    path    frcm   e    (the  entry 
node)     to    n    which    does   not    assign    a    value    to    variable   V.      Thus, 
after    determining    the    reqion    of   the    proqram   that   is   dominated    by 
assiqnmpnts   to    V,    any    reference   to    V   outside    of   that    reqion 
constitutes   an    uninitialized    variable.       Analysis    3.4    presents    a 
scheme    for    det^rmininq    those    reqions    of   the    flow    qraph    that    are 
dominated    fcy    assiqn   points   of    variable    V. 
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1.2.2  Invariant.  Assertion  Analysis 

The  techniques  presented  in  section  3.2.1  are  extremely  useful 
in  qlobal  flow  problems  because  they  are  easily  implemented  and 
efficiently  executed  by  makinq  use  of  the  internal  binary 
properties  and  instructions  available  on  most  machines,  but  they 
can  only  be  used  when  the  information  to  be  propaqated  throuqh  the 
flow  qraph  can  be  encort^d  into  a  bit  vector.   It  is,  of  course, 
pr^fer^ible  to  use  these  bit  propaqation  techniques  instead  of 
techniques  that  use  a  more  complicated  lattice  space.   A  qiven 
complex  qlobal  flow  problem  can  often  be  decomposed  into  simpler 
problems  to  which  th^  bit  propaqation  techniques  can  be  applied; 
this  loqic  is  incorporated  into  several  of  the  detection  schemes 
presented  in  Chapter  4.   The  invariant  assertion  analysis  presented 
in  *-.his  section  addresses  a  very  complex  qlobal  flow  problem  and 
requires  a  sophisticated  lattice  space  and  flow  function. 

If^H-ial  Approach 

'"he  objective  of  invariant  assertion  analysis  is  to  associate 
»ith  °ach  nod-?  of  the  flow  qraph  an  assertion  that  is  true  (prior 
to  -?x<=>cution  of  the  correspondinq  source  statement)  independent  of 
the  Particular  execution  path  taken  by  the  correspondinq  proqram. 
?h°  amount  (and  usefulness)  of  the  information  that  can  be 
Utracted  from  such  an  assertion  depends  on  the  basic  form  of  the 
ass^r^ion  and  the  sophistication  of  the  analysis  used  to  produce 
it. 
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Assertion  information  is  transmitted  through  the  flow  qraph  in 
the  followinq  manner. 

Each  node  of  the  flow  qraph  has  an  injjut  assertion  attached  to 
it,  which  is  some  combination  of  the  output  assertion  of  all 
its  predecessor  nodes.   As  a  qiven  node,  n,  is  processed,  the 
output  assertion  is  qenerated  based  on: 

•  the  input  assertion, 

•the  internal  manipulation  that  occurs  within  the  source 
statement  associated  with  the  node,  and 

•  the  specific  exit  path  (i.  e.,  the  edqe  to  a  specific 

successor)  throuqh  which  the  assertion  is  to  be 

transmitted. 
This  output  assertion  is  then  transmitted  to  the  specific 
successor  of  n  and  becomes  part  of  its  input  assertion. 

The  assertions  mentioned  above  can  take  on  a  number  of 
different  forms  with  correspondinqly  different  semantics.   For 
instance,  the  assertions  miqht  contain  information  about  the 
nesting  level  of  thp  loop  structure  or  block  structure  of  the 
proqrara  beinq  analyzed.   Of  course,  the  nature  of  the  assertions 
will  determine  the  processing  required  to  qenerate  the  output 
assertion  as  a  function  of  the  input  assertion. 


The  assertions  used  in  the  sample  analysis  presented  in  the 
next  section  contain  information  about  the  relationship  of 
variables  to  constants.   As  an  example  of  how  such  information 
miqht  be  useful,  consider  the  problem  of  division  by  zero. 

iminq  the  assertions  and  analysis  are  capable  of  encodinq  the 
relationship  between  variables  and  constants,  each  division 
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performed  in  the  proqram  can  be  checked  to  see  if  its  corresponding 
divisor  is  zero.   Thus,  if  a  component  of  the  assertion  attached  to 

A  =  B/C 
is 

rc-oi. 

then  an  appropriate  "division  by  zero"  statement  can  be  presented 
to  the  studen*-  . 


3.2.7.1  A  Specific  Realization 


This  section  presents  a  specific  realization  of  invariant 
assertion  analysis  in  which  the  assertions  encode  information  about 
the  relationship  of  variables  to  constants.   This  section  includes 
^  inscription  of: 

•  th<=>  assertions  (i.  e.,  the  lattice  space), 

which  are  conjunctive  normal  forms  of  basic  assertions, 

•  the  flow  function, 

which  is  essentially  the  logical  AND  of  the  input  assertion 
attached  to  the  node  and  the  assertion  qenerated  by  the 
execution  of  th*3  source  statement  corresponding  to  the 
node,  and 

•  * h»  meet  operation, 

which  is  the  logical  OR  of  the  assertions  propagated  by 

predecessors. 
Th^»  particular  realization  presented  here  was  picked  because  it  is 
sufficient  to  detect  the  specific  anomalies  that  must  be  detected 
lespite  its  rel**ivo  simplicity  (compared  to  other  realizations). 
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A   discussion   of   various  extensions   of    this   realization   is  presented 
in   Section    3.2.2.2. 


Assertion    Algorithm 

Fiqure    3.8    presents   the    specialized    form    of    FL0W3    (Figure    3.4) 


ASSERT_GEN  (G,e) 

for   every    q    6    G 

q. assert    <-   r  FALSE!;    q.mark   <-    •  1'B; 
rof ; 

e. assert   <-   ("TRUEl; 

do    while    (f  q.mark   =    M'B   for  some   q  €   G"|) 
for    1   <-    1    to    |G| 
it  qr  11.  mark    then 

qf  11.  mark    <-    'O'B; 
in_assert    <-   qf 11. assert; 
f.21  every.    s  €    succ  (q[  11) 
STEP1:  out_assert    <-    EXECUTE (q[ 1  1,s, in_assert)  ; 

temp  <-   ou+_assert    f   s. assert; 
if    temp   *    s.  assert    then 
s. assert    <-   temp; 
s.mark    <-    • 1 •R; 

rof ; 
fi: 

od ; 
END    ASSERT    GEN  ' 


Fiqure  3.8 

ASSERTION  GENERATING  ALGORITHM 


:: 


that  is  used  in  this  realization  of  the  invariant  assertion 
analysis.   Note  that  the  flow  function  depends  on  the  particular 
°xit  edqe  throuqh  which  the  output  assertion  is  to  be  transmitted. 
Th^r^  will  be  more  than  one  exit  pdqe  from  a  node  only  if  the  nod 
corresponds  to  a  conditional  test;  in  this  case  the  different  exi 
<*dq*»s  correspond  to  different  exit  conditions.   The  function 
EXECUTE  (at  ^TFP1  of  Fiqure  3.8)  generates  an  execution  assertion 
which  rf'Dresents  th«  rpsults  of  the  execution  of  the  cor  respondin 
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source    statement,    and    combines    it    with   the    input    assertion    to 
proiuce    the   output    assertion. 

Isspt t  ions 


There  are  three  elementary  types  of  assertions  associated  with 
this  realization  of  invariant  assertion  analysis:  input  assertions, 
sxf2'itig.a  assertions,  and  output  assertions.   An  input  assertion  is 
attached  to  each  node  of  the  flow  qraph  and  is  interpreted  as  an 
issertion  that  is  true  prior  to  execution.   When  a  node  is 
processei,  an  execution  assertion  (possibly  a  set  of  execution 
assertions)  is  qenerated,  which  represents  the  essence  of  the 
action  that  occurs  durinq  the  execution  of  the  source  statement 
correspond inq  to  the  node.   The  execution  assertion  is  based  on 
Information  extracted  from  the  input  assertion,  as  well  as  data 
■anipulation  that  occurs  within  the  node  and  may,  in  fact,  nullify 
information  present  in  the  input  assertion.   The  output  assertion, 
which  is  propaqatel  to  a  specific  successor,  is  a  combination  of 
the  execution  assertion  and  the  input  assertion  (possibly  modified 
to  -naka  it  compatible  with  the  execution  assertion).   When  a  qiven 
successor  node,  n,  is  processed,  the  OR  of  the  output  assertions  of 
all  predecessor  nodes  has  b°en  collected  and  attached  to  node  n; 
this  th°n  becomes  its  input,  assertion. 

^he  form  and  content  of  execution  assertions  will  be  discussed 
U*?r  in  this  section  alonq  with  the  lefinition  of  the  flow 
function. 
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Basic    Assertions 


M§ic   assertions    are   of    the    form: 
<basic   assert >    ::=   f<rel    assert>]4 

J    TFALSEI 

|     r  TRHE 1 
<r<?l    assert)    ::=    <var>   <rel>   <const> 
<var>    ::=   variable 
<const>    ::=   constant 
<r*l>    ::=    <  1    f1,0,0>» 

fO,i,o> 

i  >    i  <o,i,n 

J  >   |  f0,0,1> 
I  *   I  f1,0,1> 
Some  examples  of  basic  assertions  are: 

rx>3i 
r  y<8  i 
rz#-«i 

r  FALSE 1 
The  two  basic  assertions  f"Vf1,1,UC]  and  [V{0,0,0:K1  will  be 
interpreted  as  fTRUET  and  TEALSFl,  respectively. 


*  3asic  assertions  will  always  be  enclosed  in  square  brackets  for 
readabilit  y. 

5  The  parameterized  forms  of  the  relational  operators  are 
introduced  so  that  they  can  be  referenced  in  groups;  i.  e.r  <1,x,0 
stands  for  '<'  if  x  =  0  and  •<•  if  x  =  1. 
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Tnnut    And    Output    Assertions 

Input    and  output    assertions    are    held    in    conjunctive    normal 

for*:    i.    e.,    A  =    Ap^    t   hr2-t    ft    ...    ft    Af-n-i,    where    each    Ari-i    is    a 

lisjunction   of  basic   assertions.       Thus,    an    input    assertion    miqht 
look    like: 

(rx>-MYr*=3 1)  ft  (rz#5iux=2-])  «  rz=2]. 

Coniunctive  normal  form  is  used  for  the  followinq  reasons: 
D    (IUr1i)«1s1«n)  =>*  (ft(Arj-,)  ,j=1,k-1)  ft  («(Ar-JT)  ,j=k+1,n) 
(i.  «.,  th°  kth  conlunct  has  been  omitted). 

In  other  words,  if  an  assertion  in  coniunctive  form  is  known 
to  be  ♦■rue,  a  particular  conlunct  can  be  omitted  (a  heuristic 
mav  be  applied  if  the  conlunct  gets  too  larqe  or  becomes  too 
complicated  to  analyze)  and  the  remaininq  assertion  is  still 
true . 
2)    When  disjunctive  normal  form  is  used,  the  assertions  become 
"f raqmented. "   For  instance,  in  applyinq  invariant  assertion 
analysis  with  disjunctive  normal  form  it  has  been  found  that 
assertions  such  as  the  followinq  are  produced. 

ri=i  i*r  j=-9  wk*ii 

ffl>1  lftTl<10  1   ftf  J=-9  1«[  K=1  i 

rri=i  iftr  j>-9iJtr  j<o  i*r  k=  i  i 

ut>i  i*r i<ioi  »r.T>-9 1*[ j<o i  »r k=  1 1 

ihereas,    when    exactly    the    same    analysis    is   applied    usinq 
conjunctive   normal    form,    the    correspondinq   assertion    produced    is 

r r> 1 1  ftri<ioi  ftr  j>-9iftr  j<o]  *r k = 1 1 . 

This  second  form  of  the  assertion  is  much  more  compact  and  concise. 


In  this  section  the  symbol  =>  represents  loqical  implication. 
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Of  course,  the  fraqmented  assertion  generated  by  the  use  of 
disiunctive  normal  form  can  he  reduced  to  this  second  form  by  a 
series  of  simplifications,  but  this  is  costly  in  both  time  and 
space.   The  coniuotive  normal  form  seems  to  produce  concise 
assertions  that  require  littl*  or  no  simplification. 


T.he  Meet  Operation 

Thr>  me=»t  operation  is  loqical  OR.  As  each  node  is  processed, 
th»  output  assertion  is  transmitted  to  its  corresponding  successor 
s,  and  ORed  with  th<=  assertion  already  attached  to  s.  The 
collection  of  these  ORed  output  assertions  is  taken  as  the  input 
assertion  when  node  s  is  processed.  Since  assertions  are  held  in 
coninnctive  normal  form,  the  OP  must  be  distributed  over  the 
component  assertions.  A  standard  set  of  simplifications  is  used  t 
combine  basic  assertions. 

Pecall  from  the  lattice  theory  discussion  that  the  lattice 
space  used  by  the  iterative  qlobal  flow  algorithm  must  be  finite 
The  lattice  space  of  assertions  presented  here  satisfies  this 
condition  because  the  number  of  variables  used  in  a  qiven  program 
is  finite  and  the  number  of  constants  representable  in  any  qiven 
machine  is  finite. 

Qimolif i cat  ions 

r.imDl  if  icat  ions  must  b^  anplicd  to  the  assertions  to  conserv 
v?m->ry  space  ind  mak*3  the  assertions  more  concise  (and  thus  more 
fil  because  information  can  be  extracted  more  easily).   Beside 
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the  obvious  use  of  the  distributive  law,  commutative  law,  and 
associative  law,  two  types  of  simplifications  should  be  applied, 
Standard  simplifications  and  special  simplifications. 


A  T  A  =>  A  (Idempot*»nt) 

A  I  A  =>  A 

A  ft     (A    J    B)     =>    A  (Absorption) 

A  T     (*    ft    B)     =>    A 

A  f    TRUE   =>    TRUE  (Universal    bounds) 

A  ft    TRUE    =>    A 

A  T    FALSE    =>    A 

A  ft  FALSE  =>  FALSE 


A  and  B  are  arbitrary  assertions 

Table  3.3 
STANDARD  SIMPLIFICATIONS 


The  staniard  simplifications  presented  in  Table  3.3  represent 
well  known  transformations  applicable  to  assertions.   These 
simDlif ications  deal  with  arbitrary  assertions  and  are  not 
dependent  on  the  semantics  of  the  assertions  involved.   These 
standard  simplifications  should  be  applied  before  applyinq  the 
specialized  simplifications. 

The  special  simplifications  presented  in  Table  3.4  deal  with 
the  semantics  of  the  assertions;  i.  e.,  the  fact  that  the 
assertions  contain  information  about  the  region  of  the  real  number 
line  to  which  the  value  of  a  qiven  variable  is  restricted.   Note 
that  only  assertions  that  address  the  same  variable  can  be 
simplified  and  that  only  binary  simplifications  of  ANDs  and  ORs  are 
attempted . 


V<x1,y1,z1}C1  AND  Vfx2ry2,z2>C2 
produces  
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CONDITION  INSULT 

If    ClfOr1rOK2  Vfx1&x2,y1&y2,z1F,z2K1 

If    C1{z1&x2,C,x1&z2>C2  No   simplification    possible 

Otherwise  V*x1Sx2, 

y  16  (CHx2,y2,z2>C2) 

|    y2S(C2fx1,y1,zUCl)  , 

z1&z2* 

Cl*(CHx2#y2,z2>C2) 

♦    C2*  (C2fx1,y1,z1>d) 

Vfx1  ,y1,z1>d    OR    Vfx2,y2,z2K2 

produces    


CONDITION  RESULT 

If    C1<f),1,0>C2  V<x1 |x2,y1| y2,z1 |z2>C1 

If   C1<(-»z1)S(-ix2)  ,0,  (-*x1)&  (-*z2)  >C2        No    simplification    possible 

Otherwise  Vfx1|x2, 

y2&  (C1fx2,y2,z2>C2) 

I    y1&(C2<x1,y1,z1KD 

I     (Clfz1&x2,0,x1&z2}C2)  , 

z1|z2* 

C1*  (C2fx1,y1,z1>d) 

♦    C2*  (C1fx2#y2,z2>C2) 

Table    3.4 
SPECIAL    SIMPLIFICATIONS 


Some  explanation  of  th^  notation  used  in  Table  3.4  is  needed. 
The  notation  d{rel1}C2  represents  a  Boolean  function,  which  is 
TRT7:  iff  C1  is  related  to  C2  by  the  relation  frell}.  For  instance, 
C1<0,1,0>C2  iff  C1  is  equal  to  C2.  Thus,  the  AND  portion  of  the 
table  stages  that  if  C1  =  C2,  then  the  simplified  assertion  is 
obtained  by  simply  ANDinq  the  separate  bits  of  the  relation.  (Thi 
combines  assertions  lik^»  I"  1=  3  III"  l>3  ].)  If  the  constants  are  relate 
in  iurh  a  w*y  that  th<»  reqions  on  the  real  number  line  overlap  but 
neither  is  i  subset  of  the  other,  then  no  simplification  can  be 
p-rform<M.  (This  addresses  assertions  like  f  I  >0  ]  fifl<10].) 
)th*rvise,    a    simplification    can    be    performed    indicated    by   the 
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is 
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formic  qiven.   (This  combines  assertions  like  f  I<5  ]i[  I<3  ]. )  The 
value  of  the  resultant  constant  used  in  this  third  case  is 
dependent  on  the  relationship  of  the  constants  (TRUE  is  interpreted 
as  numeric  1;  FALSE  as  numeric  0)  ,  and  effectively  calculates 
either  t h^  minimum  or  maximum  of  C1  and  C2.   The  tests  of  the 
relationship  of  C1  and  C2  must  be  applied  in  the  ord«=»r  presented  in 
the  table. 

Of  course,  more  specialized  simplifications  can  be  developed 
an  1  apDlied  to  the  assertions,  but  the  simplifications  presented 
her"  se°m  sufficient  to  successfully  analyze  the  sample  proqrams  to 
which  the  invariant  assertion  analysis  has  been  applied. 

Ii°*  lynctior 

^he    ^valuation    of    the    flow   function    is    performed    at    STEP1    of 
ASSEPT_G5N.       nefor?    qoinq    into   the   details   of    how   execution 
assertions    are   qenerated,    a    short    overview   of    the    four    steps   that 
are    involved    in    the   evaluation   are    presented. 

1)  The    input    assertion,    which   is    externally    held    in 
conjunctive    normal    form,    is  converted    to   disjunctive    normal 
form 

D=     (T(Drl-i)  ,1=1, n)     ,    Dr1-,    =    (i  (D  r  i  ,  k-, )  ,  k  =  1 ,  m  r  j-, ) 
where    Dr1,k-i    are    basic   assertions. 

2)  For   each    Drii  ,    an   execution    assertion 

Er1i    =     (l(Er1,Pi)  ,P  =  1,Qr1-i) 
is   qenerated,    wh^re    each    Er1,Pi    is   an    elementary    execution 
assertion.       Each    Er1,p-,    is   determined    by    analyzinq   the 
expression    tree    of    an    expression    associated    with    the    node 
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within  the  context  of  a  portion  of  the  input  assertion 
(nr1i) . 

3)  The  corresponding  Dr1-|  and  Epj-,  are  combined  to  Bake  them 
compatible.   Let  f  be  the  combininq  function.   Then 

Pr-h  =  f(Er1i#DrlT)  =  (*  (f  (Er1#  Pi  r  Dr  j-i  ) )  ,  P=  1  ,q  r1i )  • 
U)     ?   =  (T  (Fr"h  )  t  1  =  1f  n)  is  converted  to  coniunctive  normal  form 
and  becomes  the  output  assertion  that  is  propaqated  to  a 
specific  successor. 

The  qeneralized  loqic  employed  in  the  above  steps  is  the 
followinq.   The  input  assertion  represents  a  statement  that  is  true 
before  execution  of  the  statement  correspondinq  to  the  node;  i.  e., 
either  Dr1i  is  true,  or  Dr2-»  is  true,  ...,  or  Drn-i  is  true.   For 
each  Dr1-,  an  execution  assertion,  Er1-«#  is  qenerated.   This  Er"h 
encodes  the  essence  of  the  data  manipulation  that  occured  durinq 
the  execution  of  the  node,  and  is  combined  with  Dr1i  (yieldinq 
Pi-1-i)  to  determine  the  assertion  that  is  true  upon  completion  of 
the  execution  of  the  node.   Since  one  of  the  Dr"h  was  true  before 
execution,  one  of  the  Fr1t  must  be  true  after  execution. 
(T (Pr1i ) 1 1  =  1 # n) ,  after  beinq  converted  to  coniunctive  normal  form, 
becomes  the  output  assertion. 


Steps  1  and  4  are  well  understood  and  will  not  be  discussed. 
The  remainder  of  this  subsection  is  devoted  to  steps  2  and  3; 
I.  e.,  how  are  the  execution  assertions  qenerated,  and  how  are  the 
combined  with  the  input  assertion? 
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Execution  Assertions 

Fiempntary  execution  assertions  are  of  the  form: 
<exec  assert)  ::=  <new  assert) 

I  <rel  assert) 

I  <no  assert) 

I  <delt.a  assort) 

|  <pass  assert) 
<r.?w  assert)  ::=  ne w (<var)) <rel)<const> 
<ielta  assert)  ::=  delta (<var>) <rel)<const)7 
<no  assert)  ::=  noinf o  (<va r)) 
<pass  assert)  ::=  pass 

Examples  of  elementary  execution  assertions: 

Assertion  Derived  f£°3 

new  (X)  =5  X  =  c> 

del*a  (X)  =1  X  =  X  +  1 

X  >  10  IF(X.GE.  10) 

X    <     10  " 

noinfo(X)  PEAD,X 

pass  PRINT, X 

Thn    particular    form    of    the   execution   assertions    presented    here 
was   chosen    because    they   are   sufficient   to   describe    the    data 
relationships    produced    by    the    three    basic    types   of    statements 
adir^ssei    in    this    thesis,    namely    assignment    statements,    conditional 


'  The  form  of  a  <delta  assert)  as  shown  here  relates  the  chanqe  in 
a  variable  *o  an  arbitrary  constant.  This  is  done  for  consistency 
with  the  other  assertion  forms.  A  modification  will  be  introduced 
lav^r    in    which    the    constant    is   restricted    to    the    constant   zero. 
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test,  and  I/O  statments.   Table  3.5  specifies  the  type  of  execution 
assertion  qenerated  as  a  function  of  the  type  of  statement  and 
input  assertion. 


Statement 
Assignment  (to  V) 


Assertion 
<nev  assert> 


<delta  assert) 


<no  assert) 


Input 


<no  assert) 


output 


<pass  assert) 


Cond.  test 


<rel  assert) 


<no  assert) 


Comments 

Generated  if  a  sufficient 
amount  of  informaton  can  be 
extracted  from  the  input 
assertion  about  the  assiqnment 
expression. 

Generated  if  V  is  an  additive 
component  of  the  assiqnment 
expression  and  if  a  sufficient 
amount  of  information  can  be 
determined  about  the  remaindei 
of  the  expression. 

Generated  if  there  is  not 
enouqh  information  in  the 
input  assertion  to  estimate 
the  value  of  the  assiqnment 
expression. 


Nothinq  is  known  about  the 
value  of  the  variable  read 
from  an  external  medium. 


The  output  statement  does  not 
chanqe  the  value  of  any 
variable. 


No  variable  values  are 
chanqed.   A  <rel  assert)  is 
qenerated  for  each  successor 
and  states  the  assertion  that 
is  true  if  the  correspondinq 
branch  is  taken.   Only  a 
conditional  test  can  qenerate 
a  <rel  assort). 

Generated  if  no  information 
can  be  determined  about  the 
variable  beinq  tested. 


Table  3.5 
EXECUTION  ASSERTIONS 
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First,  consider  statements  that  nodify  the  value  of  variables, 
i.  ?.,  assiqment  statements  and  input  statements.   If  some 
information  about  the  new  value  of  the  variable  can  be  determined, 
then  a  <new  assert>  or  a  <delta  assert>  is  qenerated;  otherwise,  a 
<no  assort)  is  qenerated.   A  <new  assert)  states  that  the  new  value 
of  a  variable  is  related  to  a  constant  by  a  qiven  relation.   A 
<1plta  assert>  states  that  the  new  value  of  the  variable  is  related 
to  the  old  value  by  an  additive  constant.   The  <delta  assert) 
-»ncoles  the  constant  and  relation.   A  <no  assert)  states  that,  no 
information  can  be  determined  about  the  new  value  of  the  variable. 
For  instance,  a  <no  assert)  is  always  produced  for  a  variable  in  an 
input  statement. 

Output  statements  and  conditional  tests  do  not  modify  the 
v^lue  of  any  variables.   For  an  output  statement,  the  input 
assertion  should  be  passed  throuqh  to  become  the  output  assertion 
since  no  pertinent  action  occurs  durinq  the  execution  of  the 
statement.   A  <pass  assert)  is  qenerated  for  this  purpose.   Within 
a  conditional  test,  each  exit  edqe  corresponds  to  a  specific 
condition  beinq  true  at  execution  time;  this  condition  is  encoded 
in  a  <rel  assort). 

EXECUTE 

One    of   t.ie    duties   of    EXECUTE    is    to   determine    an   estimate    for 
value    cf    a    aiven   expression   and    qenerate   an   execution    assertion 
b^ied    on    this    information    and    the    data    transfer     (or    test)    executed 
tithin    the    soiree   statement.       This   analysis    is    performed    for    each 
user- lef inM    or    system-defined    variable    that    is   assiqned    or    tested 
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within  a  qiven  statement.   For  instance,  in  the  assiqnment 
statement  "X  =  Y  ♦  1,"  the  value  of  the  expression  MY  ♦  1"  must  be 
estimated  before  an  assertion  about  the  value  of  X  can  be 
generated.   On  the  basis  of  the  value  of  this  expression  and  the 
fact  that  this  is  an  assiqnment  statement,  an  appropriate  execution 
assertion  can  be  qenerated. 

Tn  order  to  qenerate  execution  assertions,  EXECUTE  must  be 
able  to  analyze  the  source  statement  correspondinq  to  a  node,  n, 
within  the  context  of  the  input  assertion  attached  to  node  n.   For 
instance,  consider  the  followinq  code  segment  with  associated 
assertions;  we  wish  to  determine  the  execution  assertion  produced 
within  S2. 

Statement  Assertion 

si       x  =  5  r  Y>0 1 

S2    X  =  X  +  Y  r  Y>0  1  ft  f X=5  1 

In    order    to  qenerate    the   execution   assertion    delta(X)>0    at    S2, 

EXECUTE    must    be   able    to: 

•  determine   that    the   tarqet    variable    (X)    is  an    additive 

component    of    the    expression   on   the   riqht    hand    side   of    the 
assiqnment    (thus,    a    <delta   assert)   is    applicable),    and 

•  determinp    that    the    r^st   of    the    expression    (exclusive   of   X) 

is    positive. 

The    expression    tree    of    the    expression    to    be    analyzed   is    the 
basic    data    structure    needed    to   perform    the   analysis.       Basic 
assertions   extracted    from    the    input   assertion    are   attached    to   the 
leaves   of    the    expression    tree.      Assertions   are    propaqated    up    the 
lpv«ls   of    the    tree    to    the    root    node    by   applyinq   the    transformation 
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>resented    in    Tables    3.6,    3.7,    and    3.8.      These   tables   state    how 
ksertiona    attached    to    two    expression    are    combined    as    the 
jxDrossions    are    added,    subtracted,    multiplied    or   divided    to    form    a 
)pw   expression,       of    course,    a    number    of   other    operations,    such    as 
»xDon^ntiation,    unary    minus,    minimum    and    maximum,    square    root, 
»tc,    can    be    addressed    in    the    same    manner. 


•  <x1,v1,zUC1    ♦    •<x2,y2,z2>C2 

produces    

fx1fx2,    y  1Sy2Tx16z2Fz1&x2,    z1Yz2>(C1    ♦    C2) 


•  <x1  ,y1,z1>Cl    -    «i:x2#y2,z2>C2 

produces    

<x1Iz2,    y1&y2fx1Bx2Tz16z2#    z1?x2>(C1    -    C2) 

Table    3.6 
ASSERTIONS    COMBINED    AT    ♦    AND    -    NODES 


As    a    concrete    example    of    how   such    an    analysis    is    performed, 
^onsirier    the    assiqnment   statement    "I    =   J    ♦    1"    with    input    assertion 
(TJ=3  1    Y    r  J^  1)    *    rJ<9  1).       When   converted    to    dis-junctive    normal 
fori    this   assertion    becomes   fj=31    f   fj>5  1   t   [J<9].      The    expression 
to    b»   analyzed    is    MJ    «■    1"    (see    Fiqure    3.9(A)).       Three    separate 
inaLyses    (shown   in    Fiqure    3.9(B,C,D))     are    performed    obtaininq    the 
three    elementary   execution    assertions    new  (I) =4,    new  (I)  >6  ,    and 
i°w (I) <10. 

Certain    optimizations   can    be    applied    to    reduce    the    amount    of 
malysis    performed.       For   example,    even   thouqh    the    separate 
Usiuncts    of    an    input    assertion   contain   different    information,    *he 
information   extracted    and    placed    on   the   expression    tree    may   be 
Identical    for    some    se*    of    disiunct.s;    only    one    elementary   execution 
inalysis    ne^»d    be    done    for    this    set    of    disjuncts.       Such    a    situation 


•f  x1,y1,z1*d    *   •<x2,y2#z2>C2 
produces    

RESULT  CONDITION 

•<x1|x2,    y  1  f>y  2  |  x1 Sz2| z1Sx2| x1Sx2,  C1    >    0,    C2    >    0 

z1| z2|x1Sx2>  (C1*C2) 

•{z1|z2,    y1Sy2fz15x2|  x1F,z2  |  z1F,z2,  C1    <    0,    C2    <    0 

x  1 1  x2|z1Sz2>  (C1*C2) 

•<z1 | x2| x1Sz2,    y16y2| x1Sx2|z15z2|x1Sz2,  C1    >    0,    C2    <    0 

x1|z2>(C1*C2) 

•<x1| z2| z1Sx2,    y16y2|  zUz2|  x1Sx2|  z1ftx2#  C1    <    0,    C2    >    0 

z1| x2>(C1*c2) 

•fx2|x1Fz2,    y2|x1,    z2|x1Fx2>0  C1  >    0,  C2  =    0 

•<z2|z1Sx2,    y2|z1,    x2|z1Fz2}0  C1  <    0,  C2  =    0 

•<x1|x2Sz1,    y1|x2,    z1|x2Sx1>0  Ct  =    0,  C2  >    0 

•<z1|z25x1,    y1|z2,    x1|z2Sz1}0  C1  =    0,  C2  <    0 

•<x1*z2| z1fix2,    y1|y2,    x1&x2|z1Sz2*0  C1  =    0,  C2  =    0 

Table    3.7 
ASSERTIONS    COMBINED    AT    *    NODES 

would    occur   if    the    input   assertion   used    to  estimate    the    value    of 
•M    t    1"    in    Piqure    3.9    were 

TJ=n  &  (fK=Sl  f  TK=9  1)  =  (!"J=31  ft  |-K  =  51)  T  (fJ=3]  §  [K=9]). 
For  both  dis-juncts,  (*=3)  is  the  information  that  is  extracted  an 
used  in  the  evaluation  of  the  expression.  Clearly,  only  one 
analysis  need  be  performed.  A  second  optimization  can  be  applied 
if  t-hr  expression  to  be  analyzed  is  a  constant.  Clearly,  the  inp 
assertion  is  not  used  in  this  case  and  a  deqenerate  analysis  can 
appl iei . 

The  execution  assertions  must  be  combined  with  the  dis-juncts 
of  t-hp  input  lsser^ion  that  was  used  to  qenerate  them.  The  input 
assertion,    D,    is    of    the    form 

n   =    (T(DrlN)  »1«1«n)    ,    Drl-,    =    (I(Dr1#Jh)  «k«1,^r1i) 

whorr-  r>r1,k-,  ire  basic  assertions.   (In  the  example  associated  wih 
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•<x1,y1,z1>d    /    •fx2,y2rz2>C2 
produces    


RESULT 

•fx1|y1Kz2|z1&z2,    x1Gx2|y1Sy2|z16z2, 
z  1  |  x1Gx2|  y16x2>(C1/C2) 

•  fz1  |y16x2|x1Px2,    z15z2|y1Ry2|x16x2r 

x1  |  z1£z2|  y1P-z2>(C1/C2) 

•fx1|z1Rx2|y1Sx2f    z1ftx2|y1&y2|x1Sz2, 
z1| y16z2| x1&z2>(C1/C2) 

•  fz1|xUz2|y1Sz2,    x15z2|y1Sy2|z1Sx2, 

x1  |  y1Bx2|7-1Bx2>(C1/C2) 

•fx1|z1&x2,  y1,  z1|x1Sx2>0 
K<1|x16z2f   yi,    x1|z1Sz2*0 

•<x2|x1Sz2,  x1,  z2|x1Sx2>0 
•<z2|z1&x2,  z1,  x2|z1F,z2>0 
•<x1Sz2|z1Sx2,    v1,    xiex2|z1Bz2>0 


CONDITION 

C1  >    0,    C2  >  0 

C1  <    0,    C2  <  0 

C1  <    0,    C2  >  0 

C1  >    0,    C2  <  0 

C1  =  0,  C2  >  0 

C1  =  0,  C2  <  0 

C1  >  0,  C2  =  0 

C1  <  0,  C2  =  0 

C1  =  0,  C2  =  0 


Table    3.8 
ASSERTIONS     COMBINED    AT    /    NODE 


♦.<•  =  "> 


(A) 


♦  e>6) 


1C=D 


(C) 


♦ (»<10) 


J(.<9) 


1(.=  1) 


(D) 


Fiqure  3.9 
EXECUTION  ANALYSIS  EXPRESSION  TREES 
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Fiqure    3.9,    Dr1-,    =   r^=31   and    T)r2i    -   [  J>5  1    ft   r  J<9  1.)       Let    f    be  the 
combininq    function.      Then 

f  (Er1i.Dr1i)    =    (i  (f  (Er1»Pi  ,D  r1i)  )  »P=  1*qr"h)  • 

A  partial  evaluation  of  f  (Er1r  Pi  »  Dr"h )  is  shown  in  (EQ3.1). 
(EQ3.2)  and  (EQ3.3)  further  specify  the  evaluation,  and  the 
evaluation  of  f  on  elementary  and  basic  assertions  is  shown  in 
Table  3.9. 
(EQ3.1)    f  (Rr1«Pi»I>r1-i)  = 

Eri/Pi     &    Dr1i  if    Eri,PT    is    a   <rel    assert>; 

Dri-!  if    E|-1#Pt    is    a    <pass   assert>; 

(EQ3.2)  if    Eri#Pn    is    a   <new    assert>; 

(EQ3.3)  if   Eri,p-,    is   a    <no   assert); 

(E03.3)  if   Er-j,PT    is   a    <delta  assert). 


(SQ3.2)  Assume    that    E  r1 ,  Pt    is   new  (V)  fx,  y,  z>C.      Then 

f  (Eri#Pi»Dr1-i)     = 

rvfx,y,zKl   ft     (»(f  (Eri,PT,Dr1,kn))  ,k=1,mrlT)  • 

(EQ3.3)  f  (Eri,p-,,Dri-i)     =     (ft  (  f  (E  ri,  Pt  ,D  r  i,  k-,  )  )  ,  k=  1  ,  B  r"h  )  . 

Now  that  each  execution  assertion  has  been  combined  with  its 
corresponding  disjunct  of  the  input  assertion,  the  output  assertio 

F  =   (T(f  (Er1i,Dr1i)),1=1,n) 
is  computed.   of  course,  this  is  converted  to  con-junctive  normal 
form  before  it  is  propaqated  to  the  corresponding  successor. 
Continuing  with  the  example  associated  with  Fiqure  3.9, 

f  ■  r t=u i  ft  i\7=3i  r  ri*6i  «  ri<io]  &  cj>5]  &  [j<9i. 

Af*«r  conversion  to  conjunctive  normal  form,  this  becomes  the 


A3 


output    assertion 

n^w(V) <x1,y1,z1>C1  r V<x2#y2,z2>C21  [TPUEl 

assertion    (A)  ,  (A) 

not    involvinq    V 

noinfo(V)  rvfx,y,z>C]  [TPUEl 

(A)  ,    not  (A) 

involvinq    V 


lelta  (V)  fx1,y1,z1>d         J"  V{x2,  v2,  z2>C2  1  Vfxiyx2# 

y1*y2fx1ftz2Tz1*x2, 


zlTz2>(CUC2) 


(A)  ,    rot  (A) 

involvinq    V 

Table   3.9 
EVALUATION    OF    F 


A    few    words    should    be    said   about    why   the    function    f    evaluates 
to    TRUE    when    Er1»*i    i-c    a    <new    assert>    or    a    <no   assert>.      Consider 
specifically    the    noinfo(V)    entry    of   Table    3.9.       Each    r>ri#ki    of    Dr"h 
represents    ar    assertion    collected    alonq  a    particular   control    path 
to   *he    rode,    nr    currently    beinq    processed.       Since   the   execution   of 
th<»    statement   cor respondi nq   to   node    n    destroys    the    current    value   of 
variable    v#    whatever    information    was    known    about    V    before   execution 
is    no    lonqpr    valid    after    execution.       If    one    (or   more)     of    the    Dr"j»k-i 
contains    information    about    variable   V,    then    it    must    be    deleted    from 
the    conjunction    in    the    output    assertion    beinq    qenerated.      This   is 
ef  feet  iv^ly    flon<=>    by    chanqinq    all    Dr1#ki     involvinq    V    to    TRUE. 
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L§ttic§  Properties  And  Convergence 


The  information  space  of  assertions  under  the  meet  operation 
of  loqical  OR  has  the  associative  and  commutative  properties 
required  of  a  lattice  space.   Recall  from  the  qeneral  lattice 
theory  discussion  that  the  flow  function,  f,  must  have  the 
homomorphism  Droporty 

f(n,s,aftb)  =  f(n,s,a)  ft  f(n,s,b). 
In  this  framework,  the  meet  operation  is  loqical  OR,  and  the*  above 
horaomorphi sm  property  stages  that  the  followinq  two  results  are 
equivalpnt . 

1)  Apply  EXECUTE  to  the  assertion  AfB  in  order  to  obtain  the 
output  assertion. 

2)  Apply  EXECUTE  to  A  and  B  separately  and  OR  the  output 
assertions  obtained. 

Upon  inspection,  EXECUTE  has  this  property.   Thus,  we  are 
guaranteed  that  the  analysis  converqes  in  a  finite  time.   This 
realization  of  invariant  assertion  analysis  does  not,  however, 
satify  condition  (*) ,  so  processinq  the  nodes  in  reverse  postordei 
do^s  not  quarantee  the  d(OrT)  ♦•  1  upper  bound  on  converqence.   Ev€ 
thouqh  we  cannot  quarantee  rapid  converqence,  processinq  the  nodes 
in  r-v^rsp  postorder  does  provide  a  desirably  structured  framewor) 
in  whi~h  to  embed  the  analysis. 

Modifications 

Small  modifications  can  be  made  to  the  system  of  assertions 
presented  in  this  section  in  order  to  "tune"  the  analysis  to 
pro-luce  morn  lesir^ble  final  assertions.   For  instance,  consider 
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th^    tuo   identical    proqrams    (with    final   assertions  attached)    shown 
in    ^irjures    3.10    and    3.11    and    note    that   the   attached    assertions 
differ    sliqhtly.       The    assertions    shown    in    Figure    3.10    wer^ 

Statnent  Assertion 

51  READ, A  TTRUE1 

52  I    =    1  TTRUEl 

S3        soi  =  o  r 1=1  1 

S<4     10    SUM    =    SUM     ♦    A  (I)  ri<10] 

ss       t  =  t  ♦  1  n<ioi 

*f>  IPfl.LF.  10)     GOTO    10    f  T>1  1    ft    [I<111 

S7  PPIN^,A(I)  ri>10l 

Figure  3. 10 
STANDARD  EXECUTION  ASSERTIONS 

St  at  m en t  Assertion 

51  READ, A  fTRUE] 

52  1    =     1  fTRUE] 

S3        sum  =  o  r 1= 1  1 

SU     10    SUM    =    SUM    «•    A(I)  ri>1]    *    [1^10]    ft 

(r  suM=oifr  i>i]) 
ss       t=i+i  ri>iiftfi<io] 

S*     IP(I.LE.10)  GOTO  10  r I > 1 1 

S7     PRINT, A(l)  n>10] 

Figure  3.11 
MODIFIED  EXECUTION  ASSERTIONS 

generated  by  applying  the  invariant  assertion  analysis  exactly  as 
previously  specified  in  this  section;  specifically,  the  execution 
assertion  generated  by  SS  is  delta(I)=1.  The  assertions  shown  in 
Piqur"  3.11  used  the  same  analysis  except  that  the  execution 
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assertions  used  at.  S5  was  taken  to  be  delta  (I)  >0 ,  a  sliqhtly  weaker 
assertion.   Note  that  as  a  result  an  additional  basic  assertion  of 
TI>11  is  attached  to  SU  and  S5.   Since  this  is  a  desirable  fact  to 
know  (i.  e.,  to  have  both  an  upper  and  lower  bound  for  an  index 
variable),  the  analysis  of  Figure  3.11  is  prefered.   (However,  not* 
that  the  assertion  f  X < 1 1  1  is  lost  at  S6.) 

The  analysis  used  in  Fiqure  3.11  is  accomplished  by  modifying 
the  execution  assertions  such  that  a  <delta  assert>  can  only  encode 
the  relationship  of  the  chancre  in  a  variable  to  the  constant  zero 
(instead  of  an  arbitrary  constant);  i.  e., 

<delta  assert)  ::=  delta(<var>)  <rel>  0. 

A  second  modification  is  to  qenerate  a  <pass  assert)  executio 
assertion  instead  of  a  <no  assert)  when  no  new  information  can  be 
determined  about  the  variable  beinq  tested  in  a  conditional  test. 
The  output  assertion  qenerated  when  this  modification  is 
incorporated  is  still  valid  because  the  execution  of  the 
conditional  statement  does  not  chanqe  the  value  of  any  variables, 
ind  the  input  assertion  is  simply  passed  throuqh  to  become  the 
output  assertion. 

Th^  reason  that  a  <no  assert)  was  included  as  a  possible 
ex"cii*- ion  assertion  of  a  conditional  test  in  the  oriqinal 
specifications  was  to  increase  the  speed  of  converqence  of  the 
algorithm.   Consider  the  proqram  seqment  in  Fiqure  3.12  in  which 

oriqinal  form  of  a  <del+a  assert)  has  been  applied.   The 
assertions  shown  at  the  riqht  are  those  attached  to  the  statements 
lft§E  i  Ciltlil  iilOt  El£i§  °f  the  invariant  assertion  analysis. 
Rote  that  the  input  assertion  on  S6  is  fl=21,  which  occurs  because 
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of  tho  execution  assertion  "delt a  (I)  =  1 "  generated  at  SS.   What 
execution  assartions  should  be  qenerated  at.  S6  sincf»  nothinq  is 
known  =»bout  th*  value  of  N? 


If  a  <pass  assort)  is  qenerated  at  S6,  then  the  output 
=»ssartions  r  T  =  2 1  is  propaqated  to  S4  and  S7 .   When  this  assertion 
is  TPed  at  S4  and  propaqated  by  the  second  pass  of  thp  alqorithm, 
the  resul+inq  input  assertion  at  S6  becomes  f  1  =  2  Iff  T  =  3  *]•   After  the 
third  pass,  *he  input  assertion  at  S6  will  be  r  1  =  2  "jf f  1  =  3  lfr  1  =  **  ]• 
This  process  will  continue  until  the  larqest  represent  able  number 
on  *he  machine  is  reached,  which  is  clearly  an  undesirable  property 
of  thp  analysis.   Note  that  the  assertions  qenerated  are,  in  fact, 

Stat.ment  Assertion 

51  PFAD,N,  (A(J)  ,J=1,  N)  fTRUEl 

52  SUM    =    0  TTPUEl 

53  T    =    1  f  sun=o  1 

su    30   son   =    sun    ♦   a  (I)  r  1  =  1  ITT  sun=0l 

ss        r  «  i  ♦  1  r 1=1 i 

S^  IP(I.LE.N)     GOTO    30       f  1=2 1 

S7  PRINT,     SUf*  TPALSE1 

SB  STOP  TFALSE1 

Fiqur e    3.12 
CONVERGENCE    PROBLEMS 

corrpct,    bu*-    for    all    practical    purposes   the   analysis   does  not 
convprq*5. 


If  an  execution  assertion  of  noinfo(I)  is  qenerated  at  S6, 
then  an  output  assertion  of  f TRUE  1  is  propaqated  to  SU  and  S7.   In 
this  case,  the  analysis  converqes  durinq  the  second  pass  with  the 
input  assertion  fTRUET  attached  to  SU  throuqh  S8. 


It  should  now  be  clear  why  a  <no  assert>  was  included  as  a 
possible  execution  assertion  of  a  conditional  test  in  the  oriqinal 
specifications.   However,  its  inclusion  can  delete  useful 
information  from  the  assertions  attached  to  statements.   In  the 
example  above,  the  information  that  I  >  1  over  a  larqe  reqion  of 
the  proqram  is  lost. 

THp  reasons  that  a  <no  assert)  should  be  replaced  by  a 
<pass  assort)  as  an  execution  assertions  for  a  conditional  test 
ire: 

•  thp  validity  of  the  invariant  assertion  analysis  still  hold 

if  this  replacement  is  incorporated, 

•  the  new  form  of  a  <delta  assert)  seems  to  resolve  the 

converq^nce  problem  that  oriqinally  caused  the  < no  assert) 
to  be  incorporated8,  and 

•  the  deletion  of  useful  information  within  other  reqion  of 

the  proqram  will  be  eliminated  if  this  replacement  is 
i  ncor porated. 

Evidently,    within    thp    framework    of   this   realization    of    the 
invariant    ass->r*ion    analysis,    the    qeneration    of    a   sliqhtly    weaker 
execution    a~S2r*ion    fhan   can    be>    produced    allows   the    qeneration    of 


*    por    all    the    *»xampl°s    in    which    a    <no    assert)    was    replaced    by   a 

^ert),    t ho    une    of    the    new    form    of    the    <delta    assert)    has 
always    rpsnlvpl    converqencp.    problems. 
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stronger  output  assertion.   All  subsequent  references  to  the 
invariant  assertion  analysis  of  this  section  assume  that  these  two 
noli ficat ions  have  been  incorporated;  i.  e.,  a  restricted  form  of 
the  <dfllta  assert)  is  applied,  and  the  <no  assert)  execution 
assertion  of  a  conditional  test  has  been  replaced  by  a 
<pass  assert>. 


1.2.2.2  Extended  Realizations 


The  realization  of  invariant  assertion  analysis  presented  in 
Section  3.2.2.1  is  sufficient  to  detect  the  proqram  defects 
Drespnted  in  Section  2.7.   Because  of  the  form  of  the  basic 
assertions  us-»d  in  Section  3.2.2.1,  it  is  essentially  impossible  to 
determine  the  relationship  of  one  variable  to  another;  i.  e.,  for 
instance,  that: 

•  one  variable  is  equal  to  another, 

•  one  variable  is  a  constant  multiple  of  another,  or 

•  one  variable  differs  from  another  by  a  constant. 

If  such  information  is  desired,  the  basic  form  of  the  assertions 
must  be  extended.   For  example,  students  will  often  use  two  or  more 
variables  to  encode  essentially  the  same  information.   A  qiven 
variably  I,  may  contain  the  value  of  J  +  1  within  a  reqion  of  the 
proqram.   If  the  student  references  the  expression  "I  -  1M  within 
this  reqion,  h»  can  be  informed  of  its  equivalence  with  the  value 
hell  in  the  variable  J. 
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One  ex+ension  that  allows  such  information  to  be  obtained  is 
to  replace  assertions  as  follows.   Let 
<rel  assert)  ::=  <var>  <rel>  <riqht  side> 
<new  assert>  ::=  new(<var>)  <rel>  <riqht  side> 
<delta  assert>  ::=  delta (<var>)  <rel>  <riqht  side> 
<riqht  side>  ::=  <const>  *  <var>  ♦  <const> 
and  apply  the  basic  concepts  presented  in  Section  3.2.2.1. 


Such  an  extended  invariant  assertion  analysis  is,  of  course, 
capable  of  encodinq  mor*3  information  than  the  analysis  presented  i 
Section  3.2.2.1.   If  the  problem  to  be  solved  requires  the  ability 
to  relate  one  variable  to  another,  then  such  an  analysis  is 
-justified.   However,  a  siqnificant  amount  of  additional  time  and 
space  is  required  to  support  such  an  extended  analysis.   Each  basi 
assertion  requires  approximately  twice  as  much  memory  space  and  a 
qreat  many  more  assertions  will  be  qenerated.   For  instance, 
consider  a  statement  of  the  relatively  simple  form 

A  =  B  t  C  ♦  D. 
At  least  three  execution  assertions  must  be  qenerated  for  this 
assiqnment  statement,  one  relatinq  A  to  B,  one  relatinq  A  to  C,  ar 
one  relatinq  A  to  D.  Clearly,  the  number  of  execution  assertions 
qenerated  has  been  expanded  by  a  factor  determined  by  the  number  < 
variables  present  in  an  expression.  Knuth  (TKNn75T)  has  shown 
tha*,  in  practice,  8Sf  of  the  expressions  used  in  FORTRAN  proqram.' 
hav^  onp  or  less  operators;  i.  e. ,  most  expressions  are  very 
sininlp.  However,  even  though  complicated  expressions  do  not  occu: 
v-iry    oft«=>n,  +  he  mechanism  for  handlinq  th°m  must  still  be  present 
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^he    simplification    rulps    presented    in   Table    3.U    now    become 
v°ry    complicated    because    the    basic    relationships    involved   are    more 
complicated    and    the    structure    of    the    expression   tree    may    have    to    be 
perturbed    to    qenerate    an    assertion    of    the   desired    form.       A    new    type 
of    simplification    must    be    added,    which    recoqnizes    the    transitive 
nature   of    the    assertions.       For   instance,    the    assertion 

r  A>3*B«-U  "J    ft    [  B>2*C*1  1 
must    be    combined    to    produce 

fA>3*B+U]    ft    [B>2*Cf1l    S    rA>6*Ct7], 

Other    extentions    to   the    realization    of   Section    3.2.2.1    raiqht 
include: 

•  summation    notation, 

•  relations  other  than  the  standard  Boolean  relations,  or 

•  built-in  or  user-def ined  functions. 

But,  any  such  extension  will  increase  the  complexity  of  the 
function  EXECUTE  and  the  set  of  simplifications  used  to  combine 
assertions.   The  choice  of  the  specific  realization  to  implement 
must  be  determined  on  the  basis  of  the  amount  of  time  and  space 
resources  available  and  the  specifications  that  the  invariant 
assertion  analysis  must  meet. 


1.2.2.3  Summary  Of  Invariant  Assertion  Analysis 

Invariant  assertion  analysis  is  a  powerful  tool  for  collecting 
information  about  a  qiven  proqram.   The  specific  amount  of 
information  tha*  can  be  determined  is  a  function  of  the  realization 
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implemented  and  is,  thus,  dependent  on  the  tine  and  space  resources 
available. 

In  ary  event,  such  an  analysis  is  s iqnif ica ntly  more  costly 
than  the  property  P  propaaation  techniques  presented  in  Section 
3.2.1.   Thus,  it  is  less  likely  that  invariant  assertion  analysis 
will  be  implemented  in  an  interactive  compilinq  system.   It  has 
been  included  for  two  reasons: 

1)  It  demonstrates  the  power  of  iterative  qlobal  flow 
techiques  for  solvinq  complex  qlobal  flow  problems. 

2)  Once  an  invariant  assertion  analysis  has  been  performed,  a 
number  of  proqram  defects  can  be  detected  with  minimal 
additional  processinq. 

The  concepts  that  have  been  presented  in  this  section  are 
language  independent  and  apply  to  any  procedure  oriented  languaqe. 
The  analysis  is,  thus,  applicable  in  a  table  driven  compiler 

system . 

Two  interesting  observations  should  be  made  about  invariant 
assertion  analysis.   First,  *he  qeneration  of  assertions  takes 
place  within  thp  framework  of  a  qeneral  procedure  oriented 
language.   Many  assertion  analysis  systems  are  able  to  produce 
useful  assertions  only  because  they  restrict  their  analysis  to  a 
limited  realm  of  knowledge  (T 1AT76  1)  .   The  invariant  assertion 
analysis  presented  here  is  able  to  produce  useful  assertions  withi 
a  reasonably  qeneral  framework.   Second,  the  analysis  is 
accomplish**-!  by  applvinq  a  local  analysis  at  each  node  of  the  flow 
graoh;  and  propaqatinq  the  information  obtained  throuqh  successor 
°du'     Although  the  qlobal  structure  cf  the  proqram  is  available 
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in    the    flow   qraph,    no. explicit    use   is    made   of    the   qlobal    structures 
of   the    proqrai.      For   instance,    it    is    not    necessary   to    preanalyze 
the    flow    qraph    to   determine    loops   and    induction   variables   of    loops 
(CKAT76  1,    |"WEr,7UD    or    to    determine    IF-THEN-ELSE  constructs.       Only 
th«    local    structure    attached    at    each    node    is    needed. 


3.3    Search   Techniques 

.Several    applications    in    this    thesis    involve    performing    a 
s?arch,    startinq    at    a    specific    node,    n,    for    the  set    of    nodes    with 
property   0    (an    arbitrary    property)    that  are  "reachable"    from    node    n 
by    traversinq    the   edqes  of    the    flow    qraph.      Consider    the    followinq 
fluid    flow   analoqy: 

Assume  node   n    is    the    only    source   of  a    fluid    which    can    flow 
only    in    the    direction    of    the    edqes   of   the    flow    qraph.      There 
are    certain    nod^s,    called    sinks,    thrcuqh    which    fluid  can    exit 
from    the    system.       If    fluid   exits    at    a    qiven    sink,    the    fluid   is 
not    propaqated    further.      One    way    of    partially    characterizinq 
such    a   system    is    to    specify   the   set,    S,    of    nodes    (sinks) 
throuqh    which    fluid   actually    exits. 

The    search    technique    shown    in   Fiqure    3.13    will    be    applied    in 
the    followinq    situation:    PRnp-ALL   or    FROP-SOME   will    have    been 
applied    to    prDpaqate    information    throuqh    the    flow   qraph.       A 
nn^cific    node,    n,    may    be    of    interest    because    of   the    information,    z, 
attached    to   the    node.       The    search    technique    to    be   described    can 
th°n    be    applied    to   determine    the   source(s)    of    the    information,    z. 
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In    this   technique,    only   one    property   is   addressed    within  a 
qiven    search.       A    simple   branchinq    search    which  visits   each    node  no 
more    than   one?    is   applied. 


SEARCH(C,n,X;S) 
for  e  very,   q  €    R 

q.mark   <-    •0«B; 
rof ; 

STK    <-    n;     5    <-    7; 
do    while     (STK    *    null) 
q    <=    STK; 

f 2£  fiery,    x   6    X-cessor  (q) 
if    -*x.mark   then 
x. mark    <-    •  1  • B; 
if    x.s   then    S   4-    S   B     {x}  ; 
else    STK    <=    x;    fi; 

Us 

rgf ; 
od  ; 

END    SEAPCH 

Fiqure    3.13 
SPECIFICATION    OF    SEAPCH 


In    SEARCH    (Fiqure    3.13): 

•  S    is   t he   set    of    sink    nodes    "reachable"    from    node    n. 

•  STK    is    a   stack    which   holds    those   nodes   not    yet   in terroqated 

•  a. mark    and    q.s    are    bits    where: 

•  q.mark    indicates    if    the   node    has    been    "marked"    (i.    e., 

already    visited) , 

•  q.s    indicates    if    the    node    has    property    0    (i.    e. , 

candidate    sinks). 


^hp    noxt    two    subsections    present    specific    applications    which 
uso    this    search   technique. 
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3.3.1  Search  Por  "Dereference"  Points  Of  Dead  Variables 


Hnref erenced  lata  occurs  at  a  node  when  the  node  is  an  assiqn 
poin*  o*  a  variable  V,  and  V  is  dead  (i.  <*.,  not  live)  at  the  node. 
This  information  can  be  determined  by  applying  the  technique 
presented  in  Section  3.2.1.1.   The  student  may  not  understand  why 
tha  data  is  unreferenced.   If  th*3  student  requests  supplementary 
information,  a  search  for  "dereference"  points  can  be  performed. 
These  'Mereference"  points  are  either: 

•  the  en  1  of  the  proqram,  or 

•  a  node  at  which  variable  V  is  assiqned  prior  to  a  reference 

to  V. 
On  completion  o*  SEARCH  (in  Analysis  3.5) ,  S  contains  "dereference" 
noles  of  variable  V. 


— t 


Apply  SEARCH  with 

Node  r:  a  dead  assiqn  point  of  variable  V 

X:  "succ" 

Sinks:  nodes  at  which  V  is  assiqned,  and  the  exit 
nodes 


Analysis  3.5 
DEREFERENCE  POINTS 
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3.3.2  Search  For  Sources  Of  Initialized  Variables 


Once  an  uninitialized  variable,  V#  has  been  detected  (by  the 
technique  of  Section  3.2.1.2),  it  must  still  be  determined  whether 
it  is  partially  or  totally  uninitialized.   This  can  be  done  by 


Apply  SEARCH  with 

Node  n:  a  node  which  references  variable  V  but  does 
not  have  attribute  A  (as  determined  by  Analysis 
3.4) 

X:  "pred" 

Sinks:  nodes  at  which  V  is  assigned 


i 


Analysis  3.6 
SOURCES  OF  INITIALIZATION 


searchinq  for  "sources"  of  initialization  of  V.   If  there  are  no 
such  "sources",  then  V  is  totally  uninitialized;  otherwise,  V  is 
partially  uninitialized.   On  completion  of  SFAFCH  (in  Analysis 
3.6),  if  S  =  2,  then  V  is  totally  uninitialized;  if  S  #  j>,    thpn  V 
is  partially  uninitialized. 
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3.4  Iterative  Vs.  Interval  Analysis  Techniques 


Interval  iMllsi?. 

luch  work  has  been  done  on  the  use  of  intervals  (fALL70  1# 
rc^?701)  to  partition  a  flow  qraph  into  sinqle  entry  subqraphs, 
which  can  be  analyzed  more  easily  than  the  oriqinal  flow  qraph. 
Various  interval  analysis  techniques  have  been  developed  for 
n^rforainq  specific  qlobal  flow  analyses  of  the  underlyinq  flow 
qr^Dh.   These  interval  analysis  techniques  utilize  a  sequence  of 
ifLiv^i  graphs  that  depend  only  on  the  structure  of  the  underlyinq 
flow  qraph. 

Techniquis  usinq  interval  analysis  concepts  could  have  been 
leveloped  in  lieu  of  the  iterative  techniques  presented  in  this 
thesis,  but  the  followinq  undesirable  properties  were  considered 
sufficient  to  exclude  interval  techniques: 

•  these  techniques  usually  require  the  underlyinq  flow  qraph 

to  be  reducible^  (*  reasonably  stronq  property) , 

•  interval  formation  is  an  0(|N|**2)  process, 

•  ^xtra  memory  is  required  for  data  structures  associated  with 

the  intervals,  derived  qraphs  and  information  produced 
durir.  i  the  analysis,  and 


*  *  rMucible  flow  qraph  can  always  be  qenerated  from  an  arbitrary 
-low  iraDh  by  a  process  known  as  node  splitting  (fSCH7  3  1),  but  this 
1'iDlicates  certain  nodes  of  the  oriqinal  flew  qraph  and  may  cause 
ian-igpapnt  problems  when  interactinq  with  the  student. 
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since  th^  student  can  chanqe  his  proqran  at  any  point  durinq 
the  analysis,  techniques  for  maintaininq  the  intervals, 
derived  qraphs  and  attached  information  would  have  to  be 
developed  (fGIL7601).   This  would  require  even  more  memory 
and  timo  resources  durinq  execution. 


Ii.2LSiiy.®  Techniques 


The  iterative  techiques  presented  in  this  thesis  do  not  have 
the  above  undesirable  properties,  but  do  possess  the  followinq 
properties: 

•  the  underlyinq  flow  qraph  need  not  be  reducible, 

•  the  DFST  alqorithm  is  of  order  0(|E1), 

•  only  a  small  amount  of  extra  memory  is  required  for 

auxiliary  data  structures,  and 

•  since  the  DPST  alqorithm  is  reasonably  efficient,  it  can  be 

reapplied  (with  acceptable  overhead)  whenever  the  student 

edits  his  proqram. 
In  addition,  th^se  iterative  techniques  are  more  easily  understood 
by  someone  unfamiliar  with  qlobal  flow  analyses. 


99 


U  DETECTION  OF  SPECIFIC  CONSTRUCTS 

^his  chapter  presents  specific  detection  outlines  for  the 
anomalies  presented  in  Chapter  2.   The  detection  schemes  presented 
here  are  to  be  implemen ted  within  an  interactive  compiler  system. 
fhof  address  flow  graphs  that  correspond  to  proqrams  composed  of 
the  followinq  elementary  statements: 

•  simplp  assiqnment  statements  (i.  e.  ,  no  embedded 

assignments) , 

•  input/output  statements, 

•  flow  of  control  (conditional  and  unconditional),  and 

•  invocation  of  subroutines  (and  functions). 

It  is  assumed  that  all  subroutines  have  single  entry  and  single 
exit  points.   When  these  detection  techniques  are  applied  within  a 
high  level  lanquage,  high  level  constructs  must  be  decomposed  into 
their  elementary  statements.   (This  can  be  done  by  using  the 
concept  of  induced  nodes;  see  Chapter  5.) 

The  fact  that  subroutines  are  allowed  as  elementary  statements 
introduces  a  number  of  interesting  problems  that  must  be  solved. 
Por  instance,  while  analyzing  a  given  flow  graph,  G,  a  node 
corresponding  to  the  call  of  a  subroutine  (with  flow  graph  H)  may 
be  encountered.   The  analysis  may  depend  on  information  that  cannot 
be  determined  without  an  analysis  of  H.   Three  basic  approaches  can 
be  t  a  V. e n  : 

1)  "  can  b»  pr^analyzed  before  the  analysis  of  G  is  started  so 

♦■hat  the  information  needed  for  the  analysis  of  G  is 

already  available. 
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2)  the  analysis  of  G  can  be  suspended,  H  can  be  analyzed,  and 
control  returned  to  the  analysis  of  G,  or 

3)  the  node  in  G  that  invokes  H  can  be  replaced  by  the  flow 
qraph  of  H. 

Approach  3  will  not  be  considered  because: 

•  it  is  inefficient  in  both  time  and  space, 

•  •♦•he  flow  qraph  may  not  be  available  (as  with  a  built-in 

function  or  canned  routine) ,  and 

•  such  an  approach  cannot  be  taken  if  subroutines  are  called 

recursively. 
All  applications  of  property  P  propaqation  (Section  3.2.1)  will  use 
approach  1  wh3n  information  must  be  transmitted  from  subroutine  to 
subroutine.   This  approach  can  be  used  because  property  P 
propaqation  is  in  a  sense  "static";  i.  ^. ,  it  does  not  depend  on 
the  value  assiqned  to  variables  of  the  proqram.   The  application  oi 
invariant  assertion  analysis  (Section  3.2.2)  requires  approach  2. 


4.1  Applications  of  Property  P  Propaqation 


In  this  section,  schema  are  developed  for  detectinq  the 
constructs  oresented  in  Section  2.1.   A  larqe  number  of  bit  vector 
s^ts  will  be  developed.   seme  are  basic  vectors  that  contain 
information  extracted  directly  from  the  nodes  of  the  flow  qraph  ai 
1o  not  depend  on  any  qlohal  flow  analyses.   Others  are  produced  b' 
=*  qiv^n  qlohal  flow  analysis  and  represent  either  intermediate 
results  to  b*>  used  as  input  +o  another  analysis  or  final  results 
from  which  anomaly  information  is  to  be  extracted.   Given  a  bit 
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vector  set,  VXXXXXX,  the  particular  vector  of  the  set  attached  to 
node  n  will  be  denoted  hy  VXXXXXX  (n)  . 

2i2iC  Vectors 

Tvo  specialized  vectors,  VECTOR0  and  VECTOR1,  are  used  to 
represent  the  set  of  vectors  of  all  zeros  and  all  ones, 
resoect ive ly.   (Only  one  copy  of  these  vectors  need  be  maintained 
in  BPiiory.)   They  are  used,  for  instance,  when  a  specific  global 
flow  analysis  has  no  sinks  (VECTOR0)  or  when  a  set  of  nodes  are 
sinks  of  all  properties  (VECTOR1) . 

VVAPPEP  r°prespnts  a  set  of  vectors  that  encodes  information 
about  VARiables  REFerenced  at  a  particular  node  of  the  flow  qraph, 
Each  variable  used  in  the  proqram  corresponds  to  a  particular 
position  of  the  bit  vector.   The  bit  in  a  particular  position  of 
VVARPEP(n)  is  '  1»B  if  the  correspondi nq  variable  is  referenced  at 
nole  n.   VVARASS  encodes  similar  information  about  the  VJVRiables 
Assigned  at  a  particular  node  of  the  flow  qraph.   The  bit  in  a 
particular  position  of  VVARASS(n)  is  M'B  if  node  n  is  an  assiqn 
point  of  the  corresponding  variable.   These  vectors  are  basic 
vectors  for  all  tyDes  of  nodes  except  subroutine  invocations  (for 
which  they  are  determined  by  Analysis  4. 1  and  Analysis  U.2). 

VEXISTN  encodes  information  about  the  EXISTence  of  Nodes  in 
♦•he  flow  araph.   Each  position  in  the  vector  corresponds  to  a 
specific  node  of  th»  flow  qraph.   The  bit  correspondinq  to  node  n 
is  M'B  in  VEXISTN(n),  and  'O'B  in  all  other  vectors  of  the  set. 


VCOMPNT  encodes  information  about  the  COMPUte  EoiNJs  of 
expressions.   Each  total  expression,  X,  computed  in  the  program 
corresponds  to  a  position  in  the  vector.   Within  this  basic  vector, 
expressions  that  are  the  "same"  are  not  identified  with  one 
another.   Thus,  each  instance  of  any  expression  is  given  a 
different  bit  position  in  the  vector,  even  though  two  expressions 
may  be  eguivalent  in  some  sense.   The  bit  in  VCOMPNT (n) 
corresponding  to  a  given  expression,  X,  is  M'B  if  n  is  a  compute 
point  of  X,  and  •O'B  otherwise. 
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VASSPNT  encodes  information  about  ASSjqn  PoiNTs  of  variables. 
Its  content  is  similar  to  that  of  VVARASS,  except  that  each 
separate  assign  point  of  a  variable  receives  a  separate  bit 
position  in  the  vector. 

The  correspondences  used  in  VCOMPNT  and  VASSPNT  are 
coordinated  so  that  the  assign  point  of  a  variable  (encoded  in 
VASSPNT)  that  is  assigned  the  value  of  an  expression,  X, 
corresponds  to  the  compute  point  of  expression  X  (encoded  in 
VCOMPNT).   This  is  useful  in  detecting  transfer  variables  (see 
Section  1.1.6)  where  a  truncated  form  of  VCOMPNT  is  used.   The 
sugqested  ordering  is,  from  left  to  right: 

•  In  VCOMPNT 

First  those  expressions  assigned  to  variables,  and  second 
stand  alone  expressions  (such  as  an  expression  in  an  IF  or 
a  subscript  expression). 


: 
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In  VASSPNT 
First  those  assiqn  points  involvinq  expressions,  and  second 
assiqn  points  not  irvolvinq  expressions  (such  as  an  assiqn 
point  in  an  input  statement). 


•If termination  Of  Sinks  And  Sources 


In  order  to  detect  a  qiven  anomaly  (such  as  unreferenced  data) 
a  series  of  basic  qlobal  flow  analyses  (such  as  live  variable 
analysis)  must  be  performed,  but  first  the  sinks  and  sources  of  the 
basic  analysis  must  be  determined.   If  these  are  not  already  known, 
*h^n  an  elementary  analysis  must  fce  performed  to  determine  them. 
As  a  sDecifio  example  of  this,  consider  a  qeneralization  of  the 
live  variable  analysis  presented  in  Section  3.2.1.1  (Analysis  3.1). 
Such  an  analvsis  is  sufficient  when  subroutine  invocations  are  not 
permitted.   But  consider  a  code  sequence  such  as 

A  =  5 

CALL  SUB1  (A) 

A  =  B. 
In  order  to  determine  whether  or  not  the  variable  A  is  live  at  the 
statement  "A  =  5",  we  must  know  whether  A1  (the  parameter 
correspond inq  to  A)  is  assiqned  prior  to  reference  within 
subroutine  SUB1.   Thus,  SUB1  must  be  analyzed  to  determine  this 
*act.   Put  SUB1  may  call  another  subroutine  with  A*  as  arqument  and 
this  subroutine  will  have  to  be  analyzed  before  SUB1  can  be 
analyzed.   Thus,  thp  subroutines  of  the  proqram  must  be  analyzed  in 
a  "hottom-uo"  manner  to  determine  the  sources  and  sinks  associated 
with  live  variable  analvsis.   Once  the  sources  and  sinks  have  been 
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determined,  the  live  variable  analyis  can  be  performed  on  the 
subroutines  of  the  proqram  in  a  "top-down"  manner. 

Before  completinq  the  details  of  the  live  variable  analysis, 
consider  the  framework  in  which  the  separate  analyses  of  the 
subroutines  are  performed. 

Call  Gra£h 

A  qraph  describinq  the  callinq  dependency  between  the 
subroutines,  hereafter  called  the  ca.ll  graph,  can  be  created  in 
which  nodes  of  the  qraph  correspond  to  subroutines  and  the  edqes  o 
the  qraph  indicate  the  callinq  sequence;  i.  e.  ,  edqe  (n,m)  is  an 
edqe  of  the  call  qraph  iff  the  subroutine  correspondinq  to  node  n 
calls  the  subroutine  correspondinq  to  node  m.   The  call  qraph  is 
clearly  a  flow  qraph  whose  entry  node  corresponds  to  the  main 
procedure.   it  may  have  a  very  simple  structure,  such  as  a  sinqle 
node  (if  only  a  main  procedure  is  present)  or  a  daq,  or  it  may 
contain  cycles,  in  which  case  the  subroutines  contain  direct  or 
indirect  recursive  calls.   The  form  of  the  call  qraph  differs 
sliqhtly  from  the  standard  characterization  of  a  flow  qraph  (as 
presented  in  this  thesis)  in  that  the  call  qraph  may  have  no  exit 
nodes.   This  can  occur  when  subroutines  are  called  recursively. 

Given  that  a  particular  analysis  is  to  be  perfomed  on  the  fit 
qraphs  of  the  subroutines,  a  suitable  lattice  space,  meet 
operation,  and  flow  function  can  be  superimposed  on  the  call  qrap 
in  order  to  coordinate  the  separate  analyses  of  the  component 
subroutines  with  the  appropriate  information  linkaqe.   In  the 
following  discussion,  a  qeneral  outline  is  presented  first;  then 


, 
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specific    applications    to    property    P   propagation  are    given.       The 
call    qraph,    with    its   superimposed    lattice    structure,    will   be    used 
for   two    specific    forms   of    analysis,    bpttpm^up    analysis    and    tp^zdowfl 
analysis.       Specific    algorithms    are   not   presented   because    their 
forms    are    similar    to   that    of    PR0P_ALL    and    PROP_SOME. 

Lattice    Space 

Given   that    there   are    N   subroutines   present   in   the    program,    the 
call    graph    lattice    space    consists    of    N-tuples   of    the    elementary 
lattice    space    elements;    i.    e.,     (Lr1l#    Lr2-i,    ...    ,    LrN-t)     where   each 
Lr1-,    is    a    member   of   the   elementary   lattice   space   associated    with 
subroutine    j.      Refore    analysis   begins     Q,    1.,    ...    ,    J.)    is   attached 
to   each    node   of   the   call    graph   as    initialization. 

Rach    Lr1-,    will    be  referred   to  as   an    ELE    (elementary    lattice 
;  element.)    and   each   N-tuple    will    be   referred   to  as   a    SLE    (super 
lattice   element) .       Each    position    in   the   N-tuple  corresponds   to   a 
Darticular   subroutine    of    the    program,    and    it    is  through    these   slots 
that,   one   subroutine   transmits    information   to   another.      In  the   case 
°*    ^Qi^on-up.    analysis,    information   is    transmitted    from    the   called 
subroutine   to   the   calling    subroutine,    and    in    tpp-down   analysis, 
fron    the   calling    subroutine    to   the   called    subroutine. 

When    applied    to    property    p    propagation,    each    Lr1i    is  an    M-bit 
vector    and   J    is   either    VECTOR0    or    VECTOR1    depending    on    whether 
PP0P_ALL    or    PROP_SOME    is    being   applied  to    the    flow   graph    of    the 
inlividual    subroutines. 
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Call  Graph  Meet  Operation 

The    call    graph   meet    operation    is    induced    by    the    meet  operatioi 
(Up)    used    in   the   elementary   analysis  of   the  subroutines;    i.    e., 
(K|-1i#    K  t2t  r     •  ••     ,     K|-N-i)     ft     (L|-"N  #    1t2-|  #     •••    *     L|-N-i)     = 
(Kr1-i     fc  r  1i     ^r^Tt     Kt2t     ftr2-i     Lr2i#     •••     r    KrN-|     ^r^i     LrN-!), 

where    *r"h    is    'the    meet    operation    used    in    the    analysis   of    subroutin 
i. 

Within  th*   constraints   of   property   P    propagation,    the   meet 
operations   is  either  bitwise    AND   or   bitwise   OR. 

Call    Graph   Flow    Function 

The  discussion  below  presents  an  overview  of  the  evaluation  cl 
the  flow  function.  Certain  parameters  are  left  unspecified  and 
will  be  resolved  in  the  next  two  subsections  on  Bottom-up  Analysis 
and  Top-down  Analysis.  The  evaluation  of  the  flow  function  at  noc 
n     (of    the    call   qraph)     proceeds   as    follows: 

1)  Extract    information    (A)     from    the    SLE    attached    to   n. 

2)  Attach     (A)     to    the    corresponding   node  (s)    of    the    subroutine 
to    bp    analyzed . 

1)     Perform    the    required    qlobal    flow    analysis   on   the    subroutin 

flow    qraph. 
U)     Extract    information    (B)    from    the    node(s)     of    the    subroutin* 

flow    graph. 
S)     Within    a    copy    of    the    SLF    attached    to    nr    replace    the 
correspondinq    information    with    (E) . 
The    v^Iup    of    th^    flow    function    is    the    SLE    created    in    5    and    is 
nntuqitf^    ^o    .successors    (or    predecessors).       As   information    is 
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extracted    from    and    inserted    into    the    SLE,    a    transformation    must    be 
applied    to    the    FLE    in    order    to    recoqnize    the    correspondence    between 
the   arquipnts    of    the    callinq    routine    and    the    parameters    of    the 
called    routine.       The    required    homomorphism    property   of    the    flow 
function    is    inherited    from    the    homomorphism    property    of    the    flow 
function    us^d    in    thp    analysis    of    the    subroutine    flow    qraphs.       The 
information,     (A)     and    (B) ,    and    the    unspecified    nod«s    mentioned    in 
th3    above   sequence    of    actions   depends   on    whether    the   call    qraph 
analysis    is   a    bottom-up   analysis    or   a    top-down    analysis. 

Th°    approach    of   superimposinq    a    lattice    structure    on   the    call 
qraoh    is   a    very    qeneral    approach.       Within    this   thesis,    it    will    only 
be   applied   to    property    P    propaqation,    but    the    approach    is  not 
restricted    to   such   an   application.      For   instance,    the   meet 
operations  applied    in    the    separate   subroutine    analyses   need   not   be 
the   same    operation,    and    the    separate    analyses   applied    within   the 
subroutines   need    not    be   the   same    analysis.      The   only   essential 
requirement   of    the   technique    is   that    the    analysis   of    a    subroutine 
can    be    completed     (once    it    has    been    started    with    information 
transmitted   by    the   call    qraph    flow   function)     with    no    further 
introduction    of    information   into    the    subroutine's    lattice   space. 

For   concreteness,    th°    bottom-up    and    top-down    analyses    will    be 
described    within    the    context    of    property    P    propaqation,    but    the 
qeneral ization    to    a    arbitrary    flow   analysis    should    be   clear. 
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Bottom-up  Analysis 

In  this  analysis  information  is  propagated  from  successor  t< 
predecessor  of  the  call  qraph  (i.  e.,  from  called  subroutine  to 
calling  subroutine) .   The  nodes  of  the  call  graph  can  be  processed 
in  any  order  but  for  reasons  of  efficiency  should  be  processed  in 
postorder  of  the  DFST.   Given  a  specific  node  (corresponding  to 
subroutine  S)  of  the  call  graph,  evaluation  of  the  flow  function 
proceeds  as  follows: 

1)  For  each  subroutine,  P,  called  within  S,  extract  the 
corresponding  ELE,  A  rPi ,    from  the  SLE  attached  to  node  n. 

2)  Attach  each  A  rPi  as  a  source  to  all  nodes  within  the  flow 
graph  of  S  that  invoke  P. 

3)  Perform  the  global  flow  analysis  (either  PROP_ALL  or 
PROP_SOME)  on  the  flow  graph  of  S. 

U)  Extract  the  ELE,  BrS-,,  from  the  entry  (or  exit)  node  of  tt 
flow  graph  of  S. 

^)  In  a  copy  of  the  SLE  attached  to  node  n,  replace  the 
element  corresponding  to  S  with  BrS-i  . 
This  modified  SLE  is  then  propagated  to  all  predecessors  of  the 
call  graph  node  corresponding  to  S.  The  realization  of  why  this 
technique  produces  the  desired  results  should  now  be  clear.  The 
SLEs  supply  a  "mailbox  system"  through  which  information  is 
transferred.  After  a  given  subroutine  has  been  analyzed, 
information  is  extracted  from  its  entry  (or  exit)  node,  placed  ini 
the  corresponding  slot  of  the  SLE,  and  transmitted  to  its  calling 
.subroutines.  The  callinq  subroutines  can  then  extract  the 
information  it  neeis  and  proceeds  with  its  analysis. 


109 


SLE2 


SL51 


propagated  to  node  1 


attached  to  node  2 


Figure  4. 1 
EVALUATION  OF  F  IN  BOTTOM-UP  ANALYSIS 


Figure  u. 1  shows  a  pictorial  representation  of  the  process. 
In  this  figure  the  flow  function  is  being  evaluated  at  node  2.   The 
information  Ar3i  and  AraT  is  extracted  from  SLE1  and  attached  as 
sources  to  the  nodes  of  the  flow  graph  of  subroutine  2  that  call 
subroutines  3  and  4.   Upon  completion  of  the  elementary  analysis  of 
subroutine  2,    the  lattice  information  Br2i  attached  to  the  entry 
(or  »xit)  node  of  the  flow  graph  of  subroutine  2  is  extracted  and 
ins=»rt«»d  into  a  copy  of  SLE1  (SLE2)  .   SLE2  is  then  propagated  to 
node  1.   As  information  is  extracted  from  and  inserted  into  SLEs, 
the  bits  of  the  vector  are  transformed  to  recognize  the 
correspondence  between  the  arguments  cf  the  calling  routine  and  the 
parameters  of  the  called  routine. 
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Top-down  Analysis 

In  this  analysis,  information  is  transmitted  from  predecessor 
to  successor  (i.  e. ,  from  calling  subroutine  to  called  subroutine) 
of  the  call  graph.   The  nodes  of  the  call  graph  can  be  processed  in 
any  order  but  for  reasons  of  efficiency  should  be  processed  in 
reverse  postorder  of  the  DFST.   Given  a  specific  node  n 
(corresponding  to  subroutine  S)  of  the  call  graph,  evaluation  of 
the  flow  function  proceeds  as  follows: 

1)  Extract  the  ELE,  A rSi ,    that  corresponds  to  subroutine  S 
from  the  SLE  attached  to  node  n. 

2)  Attach  ArSi  as  a  lattice  element  to  the  entry  (or  exit) 
node  of  the  flow  graph  of  S. 

3)  Perform  the  global  flow  analysis  on  the  flow  graph  of  S. 

4)  For  each  subroutine,  P,  called  in  S,  collect  the  lattice 
ELE,  BrP-,  ,  attached  to  the  callinq  node  in  the  flow  graph 
of  S. 

5)  Within  a  copy  of  the  SLE,  replace  the  corresponding 
information  with  B  rP-i . 

This  modified  SLE  is  then  propagated  to  all  successors  of  the  call 
qraph  node  corresponding  to  S. 

Lilfi  Variable  Analysis 

In  order  to  see  a  concrete  use  of  the  call  graph,  let's  retur 
to  the  live  variable  analysis.   (The  analyses  developed  here  will 
be  applied  in  several  of  the  detection  schemes  presented  later  in 
thi~  chapter.)   Before  live  variable  analysis  can  be  performed  on 
qiven  flow  qraph,  G,  two  pieces  of  information  must  be  determined 
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about  pach  nod-^  of  the  flow  qraph: 

1)  is  variable  V  always  assiqned  by  the  execution  of  the 
statement  corrpspondinq  to  the  node;  i.  e.  ,  does  every 
control  path  within  the  node  assiqn  V?,  and 

2)  within  the  execution  of  the  statement  corresponding  to  the 
node,  does  there  exist  a  reference  point  of  variable  V  that 
is  not  dominated  by  assiqn  points  of  V? 

Por  all  elementary  statements  except  subroutine  invocations,  the 
above  information  is  encoded  in  VVARREF  and  VVARASS.   Analyses  U.  1 
and  4.2  qenerate  these  vectors  for  subroutine  invocations. 


Apply  PROP_ALl  in  a  bottom-up  manner  with 

in:  VECTORO 

Property  P:  "an  assiqn  point  of  V  has  been 
encountered" 

Sources  of  P:  VVARASS  (modified  by  the  call  qraph 
analysis) 

Sinlcs  of  P:  VECTORO  (i .  e. ,  none) 

X:  "succ" 

Attribute  A:  "this  node  is  dominated  by  assiqn 
points  of  V" 

Thp  modified  VVARASS  set  will  subsequently  be  refered  to 
by  VVARASS'. 


Analvsis  4. 1 
DETERMINE  VVARASS'  FOR  SUBROUTINES 
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The  body  of  Analysis  4.1  is  applied  to  each  subroutine  of  the 
proqram  (i.  e. ,  each  node  of  the  call  graph) .   Before  the  analysis 
is  applied  to  a  specific  subroutine,  S,  the  ELE  for  each  call 
invoked  within  S  is  extracted  from  the  SLE  and  attached  as  a  source 
of  property  P  to  the  corresponding  node.   PROP_ALL  is  then  applied. 
Upon  completion  of  PROP_ALL,  the  lattice  information  (attribute  A) 
attached  to  the  entry  node  of  S  is  extracted  and  inserted  in  the 
corresponding  slot  of  the  SLE,  which  is  then  propagated  to  all 
predecessors.   Analysis  4.2  is  applied  in  exactly  the  same  manner. 


Apply  PROP_SOME  in  a  bottom- up  manner  with 

in:  VECTORO 

Property  P:  "a  reference  point  of  V  has  been 
encountered" 

Sources  of  P:  VVARREF  (modified  by  the  call  graph 
analysis) 

Sinks  of  P:  VVARASS* 

X:  "Dred" 

Attribute  A:  "there  exists  a  subseguent  reference  to 
the  current  data  contained  in  variable  V" 

The  modified  VVARREF  set  will  subseguently  be  refered  to 
hy  VVARPEF1. 


Analysis  4.2 
DETERMINE  VVARREF1  FOR  SUBROUTINES 


Now  that  VVARREF'  and  VVARASS'  have  been  determined  for  the 
■iroijtin*3  invocations  of  the  program,  the  live  variable  analysis 
(Analysis  4. 1)    can  b^  performed.   The  body  of  this  analysis  is 
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Apply    PR0P_S01E    in    a    top-down    manner    with 

in:    inserted    by    the   call   qraph    analysis 

Property    P:    "a    reference    point    of   V    has    been 
encountered" 

Sources    of    P:    VVAPREP1 

Sinks    of    P:    VVAPASS1 

X:    "pred" 

Attribute  A:  "variable  V  is  live" 

The  lattic  information  (attribute  A)  attached  to  the 
nodes  become  a  new  set  of  vectors,  VLIVVAR. 


Analysis  4. 3 
DETERMINE  LIVE  VARIABLES 


apDliel  to  the  flow  qraph  of  each  subroutine  flow  qraph.   Before 
the  analysis  is  applied  to  a  specific  subroutine,  S,  the  ELE 
correspondinq  to  S  is  extracted  from  the  SLE  and  attached  as  a 
lattice  element  to  the  exit  node  of  S.   PROP_SOME  is  then  applied, 
rjpon  completion  of  PROP_SOMF,  the  lattice  information  (attribute  A) 
attached  to  each  subroutine  invocation  within  S  is  extracted  and 
inserted  into  the  correspondinq  slots  of  the  SLE,  which  is  then 
propaqated  to  all  successors. 

Thp  remainder  of  this  section  is  devoted  to  developinq 
specific  detection  schemes  for  the  anomalies  presented  in  Chapter 
2.   The  qeneral  call  qraph  techniques  presented  above  are  employed 
in  many  of  the  detection  schemes  and  should  be  well  understood 
before  continuinq. 
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4.1.1  Unreferenced  Data 


Unreferenced  data  deals  with  the  detection  of  data  assigned  to 
a  variable,  which  is  not  subsequently  referenced  by  the  proqram 
(see  section  2. 1) .   The  qeneral  approach  is  to  find  compute  points 
of  an  arbitrary  variable  V  at  which  V  is  dead. 

H^tection  Outline 

S1:       Determine  the  position  of  dead  variables. 

Perform  Analyses  4.1,  4.2,  and  4.3. 
S2:       Determine  the  set,  U,  of  nodes  for  which  a  variable  is 
dead  at  its  assiqn  point. 

U  =  fu  |  VVAFASS(U)  5  (-VLIVVAR(u)  )  #  VECTOBO}. 
S3:        For  each  u  6  n,  perform  S4-S5. 

S4:       Present  "unreferenced"  statement  to  the  student. 
S5:       If  the  student  does  not  understand  why  the  data  is 

unreferenced  (and  thus  requests  more  information),  then 
S5a:       search  for  the  set  of  "dereference"  points,  and 

Perform  Analysis  4.4.   S  (the  output  of  SEARCH) 

contains  the  nodes  at  which  V  is  "dereferenced". 
sSh:      present  "dereference"  information  to  the  student. 

Time  And  Space  Summary 

Thrfp  elementary  qlobal  flow  analyses  are  performed  requirinc 
the  use  of  VVARPEF  and  VVARASS.  VLIVVAR  is  qenerated  as  an  output 
For  each  "dereference"  request,  one  SEARCH  analysis  is  performed 
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Given    u    e    U,    invoke    SEARCH   with 

G:    flow    graph   of   the    subroutine    in    which    u    is 
present 

n:    u 

X:    "Slice" 

Sinks:  nodes  that  assiqn  variable  V  (VVARASS1)  and 
the  exit  node  (if  V  is  not  live  upon  exit  from 
t  he  subroutine) . 


Analysis  U.U 
FTND  "DEREFERENCE"  POINTS 


4.1.2  Oninitia lizel  Variables 


The  concept,  of  uninitialized  variables  deals  with  the 

i 
existence  of  a  control  path  from  the  entry  node,  e,  of  the  main 

program  to  a  node,  n  (which  references  V) ,  such  that  the  control 

path  contains  no  assiqn  point  of  V.   The  variable  V  is  partially 

uninitialized  if  there  exists  such  a  control  path;  it  is  totally 

uninitialized  if  all  control  paths  from  e  to  n  contain  no  assiqn 

point  of  V  (see  Section  2.2). 


Detection    Outline 


f 
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Apply   PROP_ALL   in  a   top-down   manner    with 

in:    inserted    by  call   qraph   analysis 

Property   P:    "an   assiqn    point  of    V    has   been 
encountered" 

Sources   of   P:    VVARASS1 

Sinks  of   P:    VECTORO    (i.    e.,    none) 

X:    "succ" 

Attribute  A:  "this  node  is  dominated  by  assiqn 
points  of  V" 

The  lattice  information  (attribute  A)  attached  to  the 
nodes  becomes  a  new  set.  of  vectors,  VASSDOH. 


Analysis  4.5 
REGION  DOMINATED  BY  ASSIGN  POINTS 


S1:       Determine  the  reqion  of  the  flow  qraph  dominated  by 
assiqn  points. 

Perform  Analysis  4.1.   (This  determines  VVARASS1, 
which  is  an  input  to  Analysis  4.5.)  Perform  Analysi 
4.5. 
S2:        Determine  the  set,  0r  of  nodes  containinq  reference 
points  of  V  not  dominated  by  assiqn  points  of  V. 

U  =  fu  |  VVARREP'  (u)  6  (-.VASSDOM(U)  )  *    VECTORO}. 
S3:        For  each  u  e  U,    perform  steps  S4-S5. 
34:        Determine  whether  u  corresponds  to  a  totally  or  partialis 
uninitialized  variable. 


117 


Perform  Analysis  4.6.   If  S  (the  output  of  SEARCH) 
is  empty,  then  variable  V  is  totally  uninitialized; 
otherwise  it  is  partially  uninitialized. 
55:        Present  appropriate  "uninitialized"  statement  to  the 
student. 

lili  Md  Space  Summary, 

Two  elementary  qlobal  flow  analyses  are  performed,  both  of 
which  use  WARASS.   VASSDOH  is  qenerated  as  an  output.   For  each 
uninitialized  variable  found,  one  SEARCH  analysis  is  performed. 


Givpn  u  e  n  invoice  SEARCH  with 

G:  flow  qraph  of  the  subroutine  in  which  u  occurs 

n:  u 

X:  "pred" 

Sinks:  those  nodes  that  assign  V  (WARASS') 


Analysis  4.6 
SEARCH  FOR  INITIALIZATION  POINTS 


4.1.3  Proqram  Structure 

There   are    several    reasons    for    wanting   to    have    information 
about    the    structure    of    the    student's    proqram.       One    is    that    other 
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analyses  may  depend  on  havinq  such  information.   A  more  important 
reason  is  the  ability  to  present  information  to  the  student  about 
the  structure  of  his  proqram.   Two  specific  structures  will  be 
discussed  here,  loops  and  IP-THEN-ELSE  constructs. 

Students  will  often  use  low  level  constructs  (i.  e. ,  GOTOs  and 
LABELs)  to  implement  hiqh  level  constructs  already  available  as 
features  of  the  source  lanquaqe.   For  instance,  when  a  student  who 
first  learned  FORTRAN  starts  proqramminq  in  PL/I,  he  may  continue 
to  implement  IF-THEN- ELSE  constructs  by  the  use  of  GOTOs  and 
LABELS.   He  may  also  use  similar  techniques  to  implement  a  DO-WHILE 
loop.   The  ability  to  detect,  these  user  implemented  constructs 
allows  the  detection  system  to  comment  on  the  student's  use  of 
lanquaqe  constructs. 

The  basic  information  used  in  detectinq  these  constructs  is 
accessibility.   A  node,  m,  is  accessible  from  node  n  iff  there 
exists  a  path  from  n  to  m  that  does  not  contain  a  back  edqe  (see 
Section  3.4).   Recall  that  every  cycle  in  the  flow  qraph  contains 
at  least  one  back  edqe.   The  set  of  ncdes  accessible  from  a  qiven 
node,  m,  is  the  set  of  descendants  of  m  that  can  be  reached  withou 
traversinq  a  back  edqe. 

Thp  concept  of  accessibility  does  not  depend  on  the  structure1 
of  the  call  qraph  and  no  information  is  transfered  between 
subroutines.   Thus,  the  analyses  presented  here  are  performed 
inlppendent  of  ^hp  structure  of  the  call  qraph. 


119 


i 


ApdIv    PROPS01E    to    the    flow    qraph   of    each   subroutine   of 
the    proqram    independent    of    the   call    qraph    structure    with 

in:    VFCTORO 

Property  P:  "nod^  n  has  been  encountered" 

Sources  of  P:  only  the  node  n  (VEXISTN) 

Sinks  of  P:  nodes  induced  by  back  edqes  (VECTOR1  for 
nodes  induced  by  back  edqes;  VECTORO  for  all 
others) 

X:  "pred" 

Attribute  A:  "node  n  is  accessible  from  this  node" 

The  lattice  information  (attribute  A)  attached  to  the 
nodes  become  a  new  set  of  vectors,  VACCESS1. 


Analysis  4.7 
DETEPHINE  NO0ES  ACCESSIBLE  FROM  A  GIVEN  NODE 


Analysis  4.7  determines  for  each  node,  n,  of  the  flow  qraph 
those  nodes  accessible  from  n.   Analysis  4.8  determines  for  each 
node,  n,  of  the  flow  qraph  those  nodes  from  which  n  is  accessible. 
Within  these  analyses,  when  a  qiven  node,  n,  is  processed,  the  bit 
correspondinq  to  node  n  is  started  propaqatinq  throuqh  the  flow 
qr^Dh  to  its  Dredecessors  (or  successors) .   We  want  the  bit  to 
continue  to  propaqate  until  a  back  edqe  is  encountered.   Since 
edq°s  of  the  flow  qraph  contain  no  flow  information,  they  cannot 
stoD  the  bit  from  propaqatir.q  further;  only  the  flow  information  in 
a  node  c=*n  do  this.   Thus,  for  each  back  edqe,  (m,n)  of  the  flow 
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Apply    PROP_SOME  to  the    flow    qraph   of   each   subroutine  of 
the   proqram    independent    of  the  call    qraph   structure    with 

in:    VECTORO 

Property  P:  "node  n  has  been  encountered" 

Sources  of  P:  VEXISTN 

Sinks  of  P:  nodes  induced  by  back  edqes  (VECTOR1  for   |  I 
nodes  induced  by  back  edqes;  VECTOEO  for  all 
others) 


X:  "succ" 

Attribute  A:  "this  node  is  accessible  from  node  n" 

The  lattice  information  (attribute  A)  attached  to  the 
nodes  become  a  new  set  of  vectors,  VACCESS2. 

Analysis  4.8 
DETERMINE  NODES  FROM  WHICH  A  GIVEN  NODE  IS  ACCESSIBLE 

qraDh,  an  induced  node10,  p,  is  inserted;  i.  e. ,  a  new  node,  p,  is 
created,  edqe  (m,  n)  is  deleted,  and  new  edqes  (m,  p)  and  (p,n)  are 
inserted.   These*  induced  nodes  are  then  made  sinks  of  all  bit 
propaqation,  the  effect  beinq  that  the  back  edqes  inhibit  the  bit 
oropaqation.   Upon  completion  of  analyses  4.7  and  4.8,  the  nodes 
induced  by  the  back  edqes  are  removed,  and  the  oriqinal  flow  qraph 
is  reinstated. 


■; 


:i 


:, 


10  ^er.  chapter  S  for  more  information  on  induced  nodes, 
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.1.1.1  Loops 


The  relationship  of  back  edqes  to  loops  is  well  docunented  for 
vincible  flow  qraphs  (see  f HEC72  1) .   In  such  a  flow  qraph,  every 
oop  is  a  sinqle  entry  cycle  and  every  back  edqe,  (m,n),  defines  a 
oop.   The  heid  of  the  loop,  n,  dominates  all  nodes  of  the  loop  and 
he  latch,  m,  determines  the  scope  of  the  loop.   Within  the 
ramework  of  this  thesis,  however,  the  flow  qraph  is  not 
lecessarily  reducible,  and  the  characterization  of  loops  is 
iliihtly  different. 

characterization  Of  Loops 

Every  cycle  contains  at  least  one  back  edqe,  and,  of  course, 
iven  a  back  edqe,  the  edqe  is  contained  in  some  cycle.   Cycles  may 
av^  multiple  entries  since  the  flow  qraph  is  not  necessarily 
ed'icible.   The  back  edqes  of  the  DFST  are  not  unique  within  the 
low  qraph;  i.  e.,  two  DFST's  may  produce  different  sets  of  back 
tdq^s  (["HEC721).   We  can,  however,  use  the  back  edqes  of  a  specific 
*PST  to  define  loops  within  the  flow  qraph. 

Ev»ry  back  edqe,  (ra,n)  defines  a  loop.   The  head  of  the  loop, 
i,  is  one  of  the  entry  points  of  the  cycle.   The  tail  of  the  loop, 
,  is  usM  ♦:  o  dpfine  the  scope  of  the  loop,  which  is  the  set  of 
odQs  contained  within  the  loop.   The  scope  of  the  loop  consists  of 
he  head,  the  tail,  and  the  set  of  nodes  that  are  both  accessible 
Ton  the  head  and  have  access  to  the  tail;  i.  e. , 
:copp  =  HT  V     (A1  I  A2)  ,  where 
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•  HT  is  the  set  consistinq  of  the  head  and  tail, 

•  IV 1  is  the  set  of  nodes  accessible  from  the  head,  and 

•  A2  is  the  set  of  nodes  from  which  the  tail  is  accessible. 

Clearly,  such  a  set  of  nodes  is  contained  within  a  cycle  since 
there  exists  a  path  from  any  node  of  the  set  to  any  other  node  of 
the  sot.   Note  that  this  definition  of  a  loop  allows  multiple  entry 
cycles,  and  reduces  to  the  conventional  definition  of  a  loop  when 
the  flow  qraph  is,  in  fact,  reducible. 


Detection  Outline 


51:        Determine  the  accessibility  of  nodes. 

Perform  Analyses  4.7  and  4.8.   For  each  node  of  the 
flow  qraph,  VACCESS1  and  VACCESS2  define  the  sets  A1 
and  A2  mentioned  above. 

S2:       For  each  backedqe,  (m,n),  of  the  flow  qraph,  perform  step 
S3. 

S3:        Determine  the  set  of  nodes  in  the  loop  defined  by  (m,n). 
The  set  of  nodes  contained  in  the  loop  is  fm)  W  {n} 
U  R,  where  B  is  the  set  of  nodes  defined  by  ANDinq 
VACCESS1(n)  and  VACCESS2  (m) . 

The  different  hiqh  level  lanquaqes  each  have  their  own 
specialized  forms  of  loops;  eq.,  FOR  loops,  REPEAT  loops,  and  WHTL 
loops.   Since  such  specialized  loops  are  lanquaqe  dependent,  their 
identification  will  not  b<=»  discussed  in  this  thesis.   A  qeneral 
>  tit  line  for  il-^ntifyinq  such  specialized  loops  is: 


123 


1)  detTiine    the    induction    variable    of    the    loop, 

2)  determine    the    type    of    the    induction    variable,    and 

3)  det=»riine    where    (in   the    loop)     and    how    (by    how    much)     the 
induction    variable    is    modified. 

For    moro    details    on    induction    variables,     see    [FON76]. 

Ill£   An!    SJ2i<I2.    Summary 

Two    el^m^ntary   qlobal    flow   analyses    are    performed    requiring 
th«    use    of    VEXISTN.       VACCESS1    and    VACCESS2    are    produced    as    output, 
Por    p?arv    bac*    edqe,    one    AND   and    two   OF   operations    are    used   to 
1et-»rmin«=>    the*    scope    of    the    loop. 


U.  1. 3.2    IF-THSN-ELSE 


Characterization   Of    TF-THEN-EL5F 


For    th-?    purposes    of    this    thesis,    an    IF-THEN-ELSE   construct    is 

Dreser.t    if: 

1)     There    exists    a    node,    n,    with    two    and    only    two    successors,    p 
and   q,    such    that    neither    of    the   edqes    (n,p)     and     (r,q)    are 
bac<c    edqes    (i.     e. ,     the    TF-THEN-EL SE    cannot    be    associated 
wi*h    the    boundary    of    a    cycle) . 
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2)  A  I  B  *  9   and  where  A  =  {p}  >  {nodes  accessible  from  p}  and 
B  =  {q}    W  {nodes  accessible  from  q} .   This  implies  that 
there  is  a  common  reqion  of  the  flow  qraph  that  is 
reachable  by  traversinq  a  "forward"  path  from  each  exit 
edqe  of  node  n. 

3)  There  exists  a  lock  node,  r,  in  A  I  B  such  that  all  other 
nodes  of  A  I  B  are  accessible  from  r  (i.  e. ,  the  separate 
"forward"  paths  taken  from  n  culminate  at  a  unique  node) . 
Note  that  if  there  is  such  a  node,  r,  then  there  can  only 
be  on^. 

The  two  separate  cases  of  th*»  IF-THEN-ELSE  are  then  A  I  {nodes  froa 
which  r  is  accessible}  and  B  I  {nodes  from  which  r  is  accessible}. 


Note  that  the  above  definition  allows  entry  points  other  than 
node  n  and  *xit  points  other  than  node  r. 

Detection  Outline 


S1 


S2 


S3: 


S«: 


Det3rmin3  the  accessibility    of    nodes. 

Perform    Analyses  4.7   and   4.8. 
For    each    node,    n,    of   the   flow   qraph    that   has   exactly   two 
successors,    p   and    q,    such   that    neither    (n,p)    nor    (n,q) 
are   back   edqes,    perform    S3-S5.  •  n 

Determine    the   nodes   accessible    from    node    n. 

A    =     {p}    W    {nodes   encoded    in   VACCESS1  (p)  }  .      B    =    {q}    » 
fnodes    encoded    in    VACCESS 1  (q) } .       Let    C    =    A    I    B.       If 
C    =    0,    then    there    is    no   IF-THEN-ELSE    (i.    e. ,    return      t 
to    S  2)  . 
Search    for    the    lock    node. 
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For    »ach    node,    r,     in   C,    we    must   check    to    see    if   C  - 
fr}    =     fnodes   encoded    in    VACCESS1 (r) } .       If    there    is 
such    an    r,    then    it    is    the    lock    node    (otherwise, 
return    to    S  2)  . 
S5:  Determine    the    separate    cases   of    the    IF-THEN-EL SE. 

The    two    cases    of   the    IF-THEN-ELSE   are    A    I     fnodes 
encoded    in    VACCESS2(r)}    and    B    I    fnodes    encoded    in 
VACCESS2(r)  }  . 

XiftS    And    S_gace    Suaaary 

Two    elementary   cjlobal    flow   analyses    are    performed    requirinq 
the    use    of    VEKISTN.       VACCESS1    and    VACCESS2    are    produced    as   outputs. 
For   »very    candidate    IF-THEN-ELSE    an    analysis    is    performed   to   verify 
th<*    IF-THEN-FLSE    and    identify    its    two   cases. 


U.1.U    Subroutine    Parameters 


This    section   addresses    two   specific    anomalies,    which   occur    in 
lanquaqes    that    use    call-by-reference    to  implement   parameter 
Dass  inq : 

•  local    variables    in    a    parameter    list,    and 

•  modification    of    an    input    parameter    (which    is    not    an    output 

parameter)  . 
The   detection    of    such    parameter   anomalies    is   accomplished    by 
detersininq   whether    or    not    a    variable    (in    the    parameter    list)     is   an 
inDUt    and/or    output    variable.       For    the    purposes   of    this    thesis,    the 


Apply   PROP_SOME   in   a    bottom-up  manner   with 

in:    VECTORO 

Property  P  "an  assiqn  point  of  V  has  been 
encountered" 

Sources  of  P:  VVARASS  (modified  by  the  call  qraph 
analysis) 

Sinks  of  P:  VECTORO  (i.  e.,  none) 

X:  "succ" 

Attribute  A:  "there  is  a  previous  assiqnment  to 
variable  V" 

The  lattice  information  (attribute  A)  attached  to  the 
nodes  becomes  a  new  set  of  vectors,  VSUBASS. 


Analysis  U.9 
DET3RKINE  ASSIGNMENT  POINTS  INSIDE  SUBROUTINES 


« 


following  definitions  apply.   An  input  variable  is  a  parameter 
that: 

11)  has  a  source  of  initialization  in  an  ancestor  of  the  call 
to  the  subroutine,  and 

12)  potentially  references  the  value  supplied  by  the  external 
initialization  within  the  subroutine. 

An  output  variable  is  a  parameter  that: 

01)  has  an  assiqn  point  inside  the  subroutine,  and 
r>2)  has  a  reference  point  in  a  descendant  of  the  call  to  the 
subroutine. 
Mote  that  these  definitions  depend  on  the  context  in  which  the 
subroutine  is  called.   12  is  encoded  in  VVARREF'.   02  is  determine 
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AdpIy    PROP_SOMF    in    a    top-down    manner    with 

in:    modified    by    call    qraph    analysis 

Property    P:    "an    assiqn    point    of    V    has    been 
encountered" 

Sources    of   P:    VSUBASS   for   subroutine    invocations; 
VVARASS    for    all    other    nodes 

Sinks   of    P:    VECTORO    (i.    e.  ,    none) 

X:    "succ" 

Attribute  A:  "there  is  a  prior  assiqnment  to 
variable  V" 

The  lattice  information  (attribute  A)  attached  to  the 
nodes  bpcomes  a  new  set  of  vectors,  VPPIASS. 


Analysis  4. 10 
DETERMINE  ASSIGNMENTS  IN  PREDECESSOR  NODES 


by  the  live  variable  analysis,  and  the  required  information  is 
encodel  in  VLIVVAR.   01  is  determined  by  applyinq  Analysis  4.9,  and 
11  is  determined  by  applyinq  Analysis  4. 10. 

Detection  Outline 


51: 


Determine  the  input/output  status  of  all  variables  in 

subroutine  calls. 

Perform  Analyses  4.1,  4.2,  4.3,  4.9,  and  4.10.   At 
each  node,  n,  that  invokes  a  subroutine,  the  input 
status  of  all  variables  is  dpterminei  by  ANDinq 
VVARPEF'  (n)  and  VPPIASS(n);  the  output  status  is 
determined  by  ANDinq  VLIVVAR(n)  and  VSUBASS(n). 
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S2: 

S3: 
S3a: 

S3b: 


For  every  paraneter,  V,  in  each  subroutine  call  perform 

step  S3. 

If  V  is  not  an  output  variable,  then 

If  V  is  not  an  input  variable,  then  "V  is  a  local 

variable  and  should  not  appear  in  the  parameter  list." 

Otherwise,  (V  is  an  input  variable  but  not  an  output 

variable)  if  V  is  modified  inside  the  subroutine 

(determined  by  VSUBASS)  then  "V  is  a  modified  input 

variable. " 


liJSi  AJ3l  Sj:ace  Summary 

Five  elementary  global  flow  analyses  are  performed  requiring 
the  use  of  VVARPEF  and  VVARASS.   VLIVVAR,  VSUBASS,  and  VPRIASS  are 
generated  as  outputs. 


4.1.5  Common  Expression  Detection 


Common  expression  detection,  as  used  in  this  thesis,  deals 
with  +-he  detection  of  multiply  occurring  expressions  (or 
subexpressions)  that: 

•  are  tr^e  eguivalent, 

•  compute  the  same  values  at  execution  time,  and 

•  are  positioned  in  the  program  such  that  one  or  more  of  the 

occurrences  can  be  eliminated. 
Some  useful  definitions  are: 
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A  defining  variable  of  an  expression  X  is  a  variable  whose 
value  is  used  in  the  computation  of  the  expression. 

A  iP-il!  expression  in  a  proqram  is  a  complete  expression  that 
does  not  ocrrur  as  a  subexpression. 

An  expression,  X,  computed  at  node  n  is  recalculable  at  node  m 
if  nole  n  dominates  node  m  and  all  paths  from  node  n  to  node  m  are 
free  of  assiqn  points  of  defininq  variables  of  X. 

Two  expressions  are  tree  equivalent  if  their  correspondinq 
parse  trees  are  identical.   This  definition  of  tree  equivalent  does 
no*"  recoqnize  the  associative,  commutative  and  distributive 
properties  of  operators.   It  is  used  here  because  tree  equivalent 
expressions  are  reasonably  simple  to  detect.   Other  definitions  of 
equivalent  may  be  used  as  lonq  as  the  equivalence  of  expressions 
can  be  detected. 


Since  the  purpose  of  this  analysis  is  to  convey  information  to 
the  student  (and  not  to  make  his  proqram  more  efficient)  ,  all 
subexpressions  contained  within  the  proqram  need  not  be  considered 
candidates  for  analysis.   Only  expressions  that  appear  in  the 
proqram  as  "total  expressions"  are  considered.   This  partially 
resolves  one  of  the  problems  normally  encountered  in  common 
subexpression  analysis,  namely  that  there  is  normally  a  very  larqe 
nunber  of  candidate  subexpressions  from  which  a  small  subset  must 
be  extracted. 
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For  a  qiven  expression,  X,  the  analysis  must  determine  compute 
points  of  X  at  which  X  is  recalculable  •   It  is  assumed  that  a 
prepass  has  been  performed  that  determines  candidate  expressions 
and  tree  equivalent  expressions. 


Pregass 


The  prepass  is  performed  to  determine  those  expressions  in  the 
program  that  are  tree  equivalent  and  are  to  be  used  as  candidates 
for  common  expression  detection.   We  will  use  as  basic  candidate 
expressions  all  total  expressions  minus  all  expressions  containinq 
no  operators  (i.  e. ,  sinqle  constants  and  variables).   The  polish 
postfix  of  each  of  these  expressions  is  qenerated  as  a  contiquous 
strinq,  and  all  postfix  forms  are  held  contiquously  in  memory 
separated  by  some  special  character,  say  #,  that  cannot  appear  in 
any  expression  strinq.   Each  basic  candidate  expression  can  now  be 
searched  for  in  the  composite  strinq.   If  the  implementation 
machine  (and  lanquaqe)  has  some  type  of  scan  command  that  crosses 
word  boundaries,  this  search  can  be  easily  implemented.   All 
matchinq  expressions  are  tree  equivalent  expressions. 
Subexpressions  that  are  not  total  expressions  may  have  been  found 
to  match  certain  basic  candidate  expressions.   Such  subexpressions 
are  considered  as  new  independent  candidate  expressions,  and  a 
compute  point  is  defined  for  them  within  the  node  in  which  they  an 
computed . 
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A  new  set  of  vectors,  VCOHEXP,  is  created  in  which  each 


Apply  PR~)P_ALL  to  the  flow  graphs  of  the  subroutines  of 
the  program  independent  of  the  call  qraph  structure  with 

in:  vfctopo 

Property  P:  "a  compute  point  of  expression  X  has 
been  encountered" 

Sources  of  P:  VCOMEXP 

Sinks  of  P:  VDEFVAR 

X:  "succ" 

Attribute  A:  "expression  X  is  r ecalculable" 

The  lattice  informtion  (attribute  A)  attached  to  the 
nodes  becomes  a  new  set  of  vectors,  VRECALC. 


Analysis  4. 11 
DETERMINE  NODES  AT  WHICH  X  IS  REC ALCUL ABLE 


position  corresponds  to  a  compute  point  of  each  separate  multiply 
occurring  candidate  expression.   If  node  n  is  a  compute  point  of  a 
given  candidate  expression,  X,  then  all  bits  of  VCOMEXP(n) 
corresponding  to  candidate  expressions  that,  are  tree  eguivalent  to 
X  are  s^t  to  '  1«B.   Members  of  this  set  of  vectors  are  used  as 
sources  in  Analysis  4.11. 

A  new  spt  of  vectors,  VDEFVAP,  is  created  in  which  each 
Dosition  corresponds  to  a  compute  point  of  each  separate  multiply 
occurring  candidate  expression  (the  same  correspondence  as  in 
VCOMEXP) .   If  node  n  is  a  compute  point  of  a  defining  variable  of  a 
candidate  expression,  X,  then  the  position  in  VDEFVAR (n) 
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correspondinq  to  X  is  set  to  M'B.   (VSUBASS  produced  by  Analysis 
4.9  is  used  to  determine  this  information  for  nodes  corresponding 
to  subroutine  invocations.)   Members  of  this  set  of  vectors  are 
used  as  sinks  in  Analysis  4.11. 

Detection  Outline 

S1:       perform  prepass  in  order  to  determine  candidate 
expressions. 

First,  perform  Analysis  4.9  obtaininq  VSUBASS  as 
output.   (This  is  necessary  to  determine  VDEFVAR 
produced  by  the  prepass.)  The  prepass  defines  two 
sets  of  vectors.   VCOMEXP  encodes  the  compute  poin 
of  candidate  expressions.   VDEFVAR  encodes  assiqn 
points  of  defining  variables  of  the  candidate 
expressions. 

S2:       Determine  nodes  at  which  expressions  are  recalculable. 
Perform  Analysis  4.11  obtaininq  VRECALC  as  output. 

S3:       Determine  the  position  of  removable  common  expressions. 
For  any  node,  n,  of  the  flow  qraph,  if  the  result  o 
ANDinq  VCOMEXP(n)  and  VRECALC(n)  is  nonzero,  then 
there  exist  a  removable  common  expression  at  that 
node . 

It™*  AQ.1  HEice  Summary. 

Two  plem^n'-ary  qlohal  flow  analyses  are  performed  requirinq 
th"  DSC  of  VVARASS,  VCOMEXP,  and  VDEFVAR.   VSUBASS  and  VRECALC  ar< 
pro-luc^d  as  output.   A  prepass  of  the  nodes  is  required  to 
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determine*    the    content    of    VCOMEXP    and    VDEFVAR.       A    linear    scan    of    the 
noles    is    used    to    find    the    removable  common   expressions. 


4.1. 6    Transfer    Variables 


Th°    concept    of    a    transfer    variable   deals    with    the    use   of   an 
unneeded    assignment    to   a    variable    in    order    to    transfer    data    through 
the    proqrara.       The    assignment    is   unnecessary    in    the   sense    that    it 
can    be    eliminated    by    replacinq    the    pertinent    references    to   the 
variable   w it h    the    expression    involved    in    the    assiqnment.       An 
exaaple    of    such    a    transfer    is 

B    =     * 

A    =     B    ♦     1  . 

Th^  valu°  of  a  variable  V  obtained  at  a  specific  assiqn  point, 
n,  is  available  at  node  m  iff  there  exists  a  path  from  node  n  to  m 
that  contains  no  assiqn  point  of  V.   A  variable  V  assiqned  in  an 
assiqnment  statement  (V  <-  <X>)  at  node  n  is  used  as  a  transfer 
variably  if: 

•  all  reference  points  of  V  at  which  X  is  recalculable  also 
have  V  (assiqned  at  n)  available,  and 

•  the  only  V  available  at  each  of  these  reference  points  is 
the  v  assigned  at  node  n. 

Mot3  that  *he  definitions  above  specifically  refer  to  the  node  in 

hich  t  he  expression  is  computed  and  the  variable  is  assiqned. 
Thus,  for  the  purposes  of  this  analysis,  each  instance  of  an 
expression  that  occurs  several  times  in  the  proqram  is  considered 
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to   he   a   separate   expression.      Similarly,    each    instance   of  a   compute 
point   of   a    variable  is  considered    to   apply  to   a   separate   variable. 

within  a    given   subroutine,    S,    variables    defined   at    specific 
assign    points   may   be  available   from   both    subroutines   that  call   S 
and   subroutines   called   by    S.       For   this  reason,    both   a    bottom-up  and 
a   top-down   analysis   are  needed   to   collect   all    information  about 
available   variables. 


Apply    PROP_SOME  in   a    bottom-up   manner   with 

in:    VFCTORO 

Property  P:  "a  specific  assiqn  point  of  variable  V 
has  been  encountered" 

Sources  of  P:  VASSPNT  (modified  by  the  call  graph 
analysis) 

Sinks  of  P:  VUNAVATL 

X:  "succ" 

Attribute  A:  "the  specific  assign  point  of  variable 
V  is  available" 

Th^  lattice  information  (attribute  A)  attached  to  the 
nodes  becomes  a  new  set  of  vectors,  VAVAIL2.   The 
modified  VASSPNT  set  will  subseguently  be  referenced  by 
VASSPNT' . 

Analysis  4.  12 
BOTTOM-UP  ASSIGN  POINTS  AVAILABLE 


In    ord^r    to    determine    nodes   at    which    specific   assign    points  of 
variables    arf    available.    Analysis    4.1,    which    determines   assign 
points    (i.    a.  $    sinks)     within    subroutines,    must    be    performed.      Once 
An^lysin    U.1    har,    bpon    performed,    a    new    set    of    vectors,    VUNAVAIL,    i£ 
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created.       Each    position    of    *his    vector   set   corresponds    to   a 
specific    issiqn    point    of    the    variables    (the    sa  me   correspondence   as 
in    VASSPN^).       For    each    variable,    V,    a    vector    ASSIGN(V)     is   defined 
in    which    all    positions    corresponding    to   any   assiqn    point    of    V    is 
set    to    M»B.       If    node    n    is   an    assiqn    point    of    V,    th^n    VUNAVAIL(n)    = 
ISSIGWff).      VUNAVAIL   is   used    as   the  set   of   sinks    in   Analyses    U.  1 2 
an1    4.11,    which    determine    the    points    at    which    specific    assiqn 
poinds    of    variables    are   available. 


Apoly    PROP_SO»iE   in    a    top-down    manner   with 

in:    inserted    by   the    call    qraph    analysis 

Proo<3rty    P:    "a    specific    assiqn    point   of    variable    V 

has    bppn   encountered" 

Sources   of    P:    VASSPNT1 

Sinks    of    P:    VUNAVAIL 

X:    "succ" 

Attribute  A:  "the  specific  assiqn  point  of  variable 
V  is  available" 

The  lattice  information  (attribute  A)  attached  to  the 
nodes  becomes  a  new  set  of  vectors,  VAVAIL1. 


Analysis  4.  13 
TOP-DOWN  ASSIGN  POINTS  AVAILABLE 
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Detection  Outline 

SI:       Determine  those  reqions  of  the  flow  qraph  in  which 
specific  assiqn  points  of  variables  are  available. 

Perform  Analysis  4.1  (to  determine  VVARASS'  used  to 
define  VUNAVAIL) .   Perform  Analyses  4.12  and  4.13, 
which  produce  VAVAIL1  and  VAVAIL2,  respectively. 
The  set  of  specific  assiqn  points  available  at  node 
n  is  VAVAlL(n)  =  VAVAlH(n)  |  VAVAIL2(n). 
S2:       Determine  those  reqions  of  the  flow  qraph  in  which 
soecific  compute  points  of  expressions  (assiqned  to 
variables)  are  recalculable . 

This  is  determined  in  a  manner  similar  to  that  of 
Analysis  4.11  where  VCOMPNT*  (the  truncated  form  of 
VCOMPNT;  padded  on  the  riqht  with  zeros  for 
expressions  that  do  not  have  a  correspond inq 
assiqnment  variable)  is  used  as  sources.   Assume 
that  the  recalculable  information  is  encoded  in 

VRFCALC*. 

s3:  Let    R    be   the   set    of    all    assiqn    points   in    the    proqram. 

Store    R    as    a    bit    vector    of    all    M«B     (with    the    same 
correspondence   as    that    in    VASSPNT')* 

34.  Determine    the   assiqn    points    that    define   transfer 

v ir iables. 

Perform  the  actions  specified  by  the  code  in  Fiqure 
U.2.  Upon  completion,  the  set  R  defines  the  assiqn 
points  of  transfer  variables. 
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for    every    q    ?    G 

1°£   §X£IY    v    referenced    at    q    /*    determined    by    VVAPREF    */ 
/*    assiqn    points    of    V    available    at    q    */ 
TV    <-    ASSIGN(V)     8    VAVAIL(q)   ; 
if   r  TV    has    more    than    one    bit    on  1    tjjpn 

R    <-    R    S     (-^TV) 
SIM    R    <-    R    fi     (--(TV    &     (-VRFCALC*  (q)  )  )  )  ; 
ill 

f: 


ro 
rof ; 


Fiqure  4. 2 
FIND  ASSIGN  POINTS  DEFINING  TRANSFER  VARIABLES 


Four  elementary  qlobal  flow  analyses  are  performed  requirinq 
the  use  of  VVARASS,  VASSPNT,  VUNAVAIL,  VAVAIL1  and  a  variant  of 
VCOflPNT.   A  linear  scan  of  the  nodes  is  used  to  determine  the 
assiqn  points  correspondinq  to  transfer  variables. 


4.1.7  Summary  Of  Property  P  Propaqation 


The  detection  schemes  presented  in  Section  4.1  are  intended  to 
serve  as  verification  that  property  p  propaqation  is  sufficient  to 
detect  a  larqi  number  of  proqram  anomalies.   It  is  not  necessarily 
intended  that  all  these  detection  schemes  be  incorporated  into  one 
analysis  system.   Tn  the  development  of  an  analysis  system  for  a 
specific  lanquaae,  the  desiqner  should  selectively  pick  those 
anomalies  that,  occur  most  frequently  within  the  lanquaqe  and 
concentrate  his  efforts  there.   Such  a  set  of  anomalies  would 
orobably  include  som°  of  those  presented  in  this  thesis,  but  may 
also  include  other  anomalies  that  are  more  specific  to  the  lanquaqe 
itself. 


TMPUT 
(sinks  &  sources) 


ANALYSIS 


OUTPUT 
(produced  by  analysis) 


VVARREF 


VVARASS' 


VVARREF' 


VVAPASS 


VEXISTN 


VSUBASS 


VCOMEXP 


VDEFVAR 


4.1 


DETERMINE  VVARASS' 


-•-4.2 


DETERMINE  VVARREF' 


^T4.3 


LIVE    VARIABLE    ANALYSIS 


4.5  -- 


ASSIGN  POINT  DOMINANCE 


IBILITY 


IBILITY 


SSIGN  POINTS 


4.10  

SIGNMENT 


4.  11 


VVARASS' 


VVARREF' 


-•-  VLIVVAR 


-•-VASSDCM 


-*►  VACCESS1 


VACCFSS2 


■VSUBASS 


-^-VPRIASS 


-»►  VPECALC 


RECALCULABIE    EXPRESSIONS 


VASSPNT 


^[4.12 


AVAILABLE  ASSIGN  POINT! 


VAVAIL2 
VASSPNT' 


VMNAVATL 
VASSPNT'' 


^4.13 


Fiqure  4.3 
VECTOR  SET  INTERDEPENDENCE 


VAVAIL1 


Fiquro  u.3   presents  a  summary  of  the  interdependences  of  th 
vector  sets.   Durinq  the  execution  of  any  particular  elementary 
iLobal  flow  analysis  (or  sequence  of  such)  only  a  small  number  of 
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rector  sets  need  be  in  immediately  accessible  memory  at  any  given 
imp. 


».2  Applications  of  Invariant  Assertion  Analysis 

This  section  discusses  a  detection  scheme  for  each  of  the 
oroqram  defects  presented  in  Section  2.7.   Each  of  these  defects 
can  be  detected  upon  completion  of  the  invariant  assertion  analysis 
Dresented  in  Section  3.2.2.1,  and  each  of  the  following  subsections 
issumps  this  analysis  has  been  performed. 


*.2.1  Subscript  Out  Of  Array  Pounds 

Th°  invariant  assertion  analysis  has  produced  an  assertion  for 
ach  node  of  the  flew  graph.   At  each  array  reference  the  value  of 
•he  subscript  can  be  analyzed,  within  the  context  of  the  assertion 
ittached  to  the  node  in  which  the  array  reference  occurs,  to 
determine  if  the  subscript  is  in  bounds.   Consider  the  program 
segment  in  "igure  U.U;  the  assertions  are  supplied  at  the  right. 

Assume  A  is  dimensioned  10  elements  long.   By  interrogating 
►h^  abortions  attached  to  the  nodes,  it  can  be  determined  that  the 
array  reference  at  S<i  is  in  bounds  and  the  array  reference  at  S7  is 
ut  of  bounds. 


' 


statmejit 

51  READ, A 

52  1=1 

53  SUM    =   0 
SU  10     SUM    =    SUM    *    A  (I) 

55  1=1+1 

56  IF(I.LE.IO)     GOTO     10    [I>11 

57  PRINT,  A(T)  fl>101 


Assertion 

("TRUE"! 

fTRUEl 

[i-n 

n>11   I    [I<10]    I     ([SOM=0]f[I>1  ]) 
[I>1  1   ft    f I<10] 


1U0 


Figure  4.4 
ARRAY  REFERENCE  OUT  OF  BOUNDS 


However,  all  array  out  of  bounds  conditions  are  not  as  easily 
decided.   Assume  an  array  reference  to  A (I)  between  S5  and  S6  and 
that  the  corresponding  assertion  is  fl>1].   Of  course,  such  an 
array  reference  would  be  out  of  bounds  but  the  assertion  [I>11  does 
not  quarantee  this  fact;  it  only  indicates  a  potential  out  of 
bounds  reference. 


4.2.2  Parameter  Of  A  FORTRAN  DO  Loop  <  0 

Thp  ANSI  definition  of  FORTRAN  states  that  all  parameters  of  a 
DO  loop  must  be  positive.  After  the  invariant  assertion  analysis 
has  bean  performed,  if  it  can  be  determined  that  a  parameter  of  a 
DO  loop  is  non-positive,  an  appropriate  message  can  be  presented. 
Consider  the  program  segment  in  Figure  4. 5.  J  used  as  a  parameter 
of  t-hp  DO  loop  at  S3  is  non-positive  as  indicated  by  the  attached 
assertion . 


Statient 

si    no  20  i  =  1 ,1  0 
s2    j  =  t  -n 


S3 


DO  20  K  =  1 , J 


Assertion 

f TR0R1 

ri>i 1 1  n<ioi  &  cci«i]?rj>-9]  > 
«  cr i=i  ]tcj<oi  )  &  (n=i  irr Ff>i  i) 

r i>i i  &  r i<ioi  «  cj>-9] 
«  rj<o i  &  <n=i]TrK>n> 


1U1 


SU  B(T)     =    B(I)*A(I,K)        fl>1l    ft    [I<101    ft    N>-9]    ft    [J<0] 

i  r  k=i  i 


SS    20    CONTTNUP 


ri>ii  ft  [i<ioi  ft  r »j>-9 i  ft  tj<o] 
*  r k=i i 

Fiqure   4.5 
DO    LOOP    PARAMETER    CODE 


U.2.3    Division    By   Zero 


Detection   of    division    by    zero   can    be   accomplished    in   one    of 
two    ways.      The    function    EXECUTE    (of    Section    3.2.2.1)    must    estimate 
III  »    value    of    expressions   involvinq   division.       A    test   can    be 
incorporated    into    EXECUTE    to    check    if    the   divisor   of    a    qiven 
division    is   z^ro   and    flaq    the    situation    for    further   comment    to    the 
stu  lent . 


St a t men t 

51  A    =    5 

52  B    -    -5 

53  C    =    A/(A+B) 


Assertion 

TTROE] 

T  A  =  5  1 

fA=5]   ft   re=-5i 

Fiqure   U.6 
DIVISION    BY    ZERO    CODE 


If    for   soue    reason    it    is    undesirable    to    include    such   a    feature 
vithin   EXEC'Jtr,    then    a    separate   analysis    can    be    performed   upon 
completion    of    the    invariant    assertion    analysis.      Such    an    analysis 
■ust    evaluate     (if    possible)     the   divisor   of   each   division    operator 


within  the  context  of  the  assertion  attached  to  the  correspondim 
nod*.   If  the  evaluation  of  the  divisor  results  in  zero,  an 
appropriate  messaqe  can  be  presented. 

Consider  the  proqram  seqment  in  Figure  4.6.   The  division  b' 
zero  in  S3  can  be  detected  by  either  of  these  schemes. 


4.2.4  Unnecessary  Testinq 

Unnecessary  testinq  deals  with  the  testing  of  a  condition 
that,  at  the  point  of  the  test,  is  either  uniformly  TRUE  or  FALS 
The  qeneral  detection  Scheie  is  as  follows. 

For  each  conditional  test  in  the  proqram,  determine  that 
condition,  c,  beinq  tested  and  the  assertion.  A,  attached 
the  node.  If  A  =>  C,  then  condition  C  must  be  true  and  thi 
correspondinq  branch  will  always  be  taken.  If  A  =>  -»C,  th 
condition  C  must  be  false  and  the  correspondinq  branch  is 
never  taken  (and  can,  thus,  be  eliminated).  If  neither  of 
these  conditions  holds,  then  more  than  one  branch  of  the 
conditional  test  can  occur. 

Consider  the  proqram  seqnent  in  Fiqure  4.7.   The  condition,  D  =3, 
tested  at  S2  is  uniformly  false  and  S2  can  be  eliminated  from  le 

proqram. 
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Stataent  Assert  ion 

51  TF  (D.EQ.  3)  GOTO  10   (A) 

52  IF(D.EQ.l)  B  =  5     r  D#  3  1  I  (A) 

Fiqure  4.7 
TJNNECFSSARY  TEST  CODE 

U.2.S  Non-executable  Code 


A  rqn trpl  £ath  in  a  flow  qraph  is  a  sequence  of  nodes  in  which 
the  first  nod^>  is  the  entry  node,  e,  and  the  nth  node  of  the 
sequence  is  a  predecessor  (in  the  flow  qraph)  of  the  ntlst  node  of 
the  sequence.   An  execution  path  is  a  control  path  for  which  the 
followinq  semantic  condition  is  true: 

the  execution  conditions  produced  by  the  execution  of  the 
correspondinq  proqram  does  not  exclude  the  execution  of  the 
statement  correspondinq  to  the  n+1st  node  upon  completion  of 
the  execution  of  the  statement  correspondinq  to  the  nth  node. 

Th°re  may  exist  nodes  of  the  flow  qraph  that  are  not  in  anj 
execution  nath.   This  section  presents  a  technique  for  determininq 
reqions  of  the  flow  qraph  that  are  contained  in  no  execution  path. 

St  at went  Assertion 

S1     T  =  1  r  TRUE  1 

S2         sni  =  0  r 1  =  1 1 

S3  10     SU1    =    SOB    ♦     A(I)  ri^1l    *     (f  I>1  ]TrSUM  =  01) 

S'4  1=1+1  fl>1  1 

S5  ir(I.GE.O)     GOTO  10       [I>11 

SA  f>pTNT,Sn»1  TPALSEl 

Piqure   4 . 8 
NON-FXECUTABLE    CODE 


urn 


Consider  the  proqram  seqment  in  Fiqure  U.8.   Note  that  the 
assertion  attached  to  S6  is  FALSE.   This  fact  guarantees  that  S6  is 
not  a  member  of  any  execution  path;  i.  e. ,  S6  will  never  be 
executed. 
Qbservatgn  4.J, 

Any  node,  n,  that,  after  application  of  the  invariant  assertion 
analysis  of  Section  1.2.2.1  has  the  assertion  FALSE  attached  to  it, 
is  not  a  member  of  any  execution  path. 

Before  proving  this  obervation,  consider  the  followinq  lemma. 
Lemma  4.__1 

Durinq  the  execution  of  ASSFRT_GEN,  if  a  node,  n,  has  an  input 
assertion  D  #  r FALSE!  and  an  output  assertion  A  =  [FALSE]  for  a 
qivon  branch  R,  then 

1)  node  n  corresponds  to  a  conditional  test,  and 

2)  the  pxecution  assertion,  Fr1#"»T  (EQ3.  1  in  Section  3.2.2.1) 
is  inconsistent  with  the  input  assertion,  D;  i.  e.  ,  the 
corresponding  branch  R  is  not  taken. 


Proof  of  lemma  4. 1 : 


After  analyzinq  FQ3.1  (which  depends  of  EQ3. 2,  EQ3.3  and  Tabl 
3.9),  it  is  evident  that  the  only  way  of  obtaininq  an  output 
assertion  of  [  FALSE  1  qiven  an  input  assertion  D  #  [FALSE]  is  by 
application  of  +he  first  line  of  EQ3.1.  This  implies  that  the 
execution  assertion  is  a  <re1  assert),  which  is  only  qenerated  foi 
a  cori  it  ior.al  test  (provinq  1).  Since  the  output  assertion 
Rr1#n<i  ft  D  ■  [FALSE],  the  execution  assertion  is  inconsistent  witl 
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the    input    assertion;    i.    e.,    the   execution    of    the   corresponding 
branch    produces   an    assertion    that    cannot    be    true,    qiven    the    input 
ass°r*ion.      Thus    th<*  corresponding    branch    is    not    taken. 

Proo^    of    observation    4.1: 


Since    node    n    has    an    input    assertion    of   f FALSE],    all    of    its 
predecessors    must    propaqate   output   assertions    of    [FALSE"].      Since 
tho    input    assertion    attached    to   the   entry    node    is   [TFOE],    every 
control    path    cor.taininq   node    n   contains   a    node,    p,    with    a   non-FALSE 
assertion,    and    a    F}LSE    output    assertion.       This    node    p    satisfies   the 
conditions    of    lamina    4.1,    and,    thus,    is   a    conditional    test   for    which 
the   correspor.dinq    branch    cannot    be   taken.      Thus,    every   control    pa*h 
containinq    nodo    n   contains   a    conditional    test    with    a    branch    that 
canno*    be   taken;    i.    e . ,    n    is    not    a    member   of    any   execution    path. 


. 


5  IMPLFHENTATION 


This  Chapter  describes  implementation  aspects  of  important 
components  of  an  analysis  system  that  employs  the  global  flow  I 
techniques  presented  in  Chapters  3  and  ft.   Section  5.1  presents  a 
general  interactive  compiling  system  framework  into  which  these 
global  flow  detection  technigues  can  be  incorporated.   The  genen 
properties  of  the  system  components  are  discussed  but 
implementation  details  are  not  presented.   The  remainder  of  the 
chanter  addresses  detailed  implementation  aspects  of  the  global 
flow  analysis  components  of  the  system. 


5.1  compiler/Analysis  System  Overview 

The  diagram  in  Figure  5.1  describes  the  general  control 
structure  of  an  interactive  compiler  system  incorporating  the 
analysis-detection  system  presented  in  this  thesis.  The  design 
presented  here  is  only  an  example  of  how  an  analysis  system  miq 
be  used  as  a  coroutine  with  a  compiler  system.  This  particular 
desiqn  is  based  on  the  following  assumptions. 

1)  The  compiler  system  is  already  implemented  and  only  min 
changes  can  be  made  to  it  in  order  to  accomplish  the 
interface;  i.  e.,  none  of  its  underlying  design  or  data 
structures  can  be  modified. 
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Figure   5. 1 

SYSTEM  CONTROL  STRUCTURE 
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2)  Only    small   programs,    of#    say,    thirty   statements   or   less, 
are  to   be    passed   across  the   interface. 

3)  Input   from    the   student    is   accepted   from   a   keyboard. 

4)  Output    is   displayed   to   the  student   on   an    interactive 
screen. 


Compiler-translator   System 

The   compiler   system    (at    left    in    Figure   5.1)     is  assumed   to    be 
similar   to   that   described    in    TWIL76  1.      Besides  stating   the 
following    three   essential    reguirements  of   the   compiler   system,    the 
compiler  system  will   not   be  discussed    further. 

1)  The   SYMBOL    TABLE    and    INTERMEDIATE   TEXT    must   contain 
sufficient    information    for   the  analysis   system   to  extract 
what    it    needs. 

2)  The  compiler  system  must  be  able  to  accept  input  source 
text  from  an  intermediate  file  produced  by  the  analysis 
system. 

3)  The  compiler   system   must    have   some   facility    for    the    user   tc 
reguest    the    invocation   of    the    analysis   system. 

Note    that    the    interface    between   the   compiler   system    and    the 
analysis   system    is    guite    narrow.       When   control    is   passed    from    the 
compiler   system    to    the    analysis   system,    the    INPOT    DATA    INTERFACE 
Bioitilo    is    invoked,    which   creates   an   ANALYSIS    SOURCE    TEXT    file    from 
the    compiler    SYMBOL    TABLE    and    INTERMEDIATE   TEXT.       Before    control    i 
returned    to   the   compiler    system,    the    OUTPUT    DATA    INTERFACE    module 
La    invoke,    which    creates    a    source   text    file    to    be   accessed    by    the 
compiler   system.       This    flesiqn    of    separating    what    might    have    been 
common    rta*a   bas*»    was    chosen   so   that    a    change    in   the   data    structure 
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of  on«  of  the  two  systems  does  not  adversely  affect  the  underlying 
structure  of  thr»  other  system.   As  a  change  is  incorporated  into 
one  system,  only  the  INPUT  DATA  INTERFACE  module  and  OUTPUT  DATA 
INTERFACE  module  need  be  modified.   Such  a  desiqn  also  insures  that 
each  systpm  need  only  work  with  data  pertinent  to  its  own  analysis, 
instead  of  data  impost  upon  it  because  of  importance  to  the  other 
system. 

IniiYsis  System 

The  analysis  system  has  ten  process  modules.   The  INPUT  DATA 
INTERFACE  module  and  OUTPUT  DATA  INTERFACE  module,  as  discussed 
above,  ar<=  used  to  transform  the  data  base  used  by  one  system  into 
th->  lata  base  used  by  the  other  system.   No  useful  analysis  is 
performed  in  these  modules,  and  they  will  not  be  discussed  further. 
Th^  major  responsibilities  of  each  of  the  remaining  modules  is 
briefly  discussed  below. 

ANALYSTS  MONITOR 

Th°  ANALYSIS  MONITOR  coordinates  the  execution  of  all  other 
modules  of  *he  analysis  system.  It  first  invokes  the  INPUT  DATA 
INTERFACE  module,  which  creates  the  ANALYSIS  SOURCE  TEXT  us^d  to 
produce  the  flow  nraph  of  the  student's  source  program.  It  then 
invokes  thp  GRAPH  GENERATOR  and  STRUCTURE  ANALYZER,  which  create 
th^  underlying  flew  graph (s)  and  determine  the  ordering  of  the 
noi^s  used  by  the  GLOBAL  FLOW  ANALYSTS  module.  The  INPUT/OUTPUT 
HANDTPR  is  then  invoked  to  accept  commands  entered  on  the  keyboard 
by  the  student. 
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The  student  will  have  a  variety  of  requests  that  he  can  make. 
This  may  vary  with  different  system  designs,  but  the  followinq  set 
of  commands  is  suqqested: 

1)  The  student  can  request  information  on  the  operation  of  the 
analysis  system.   such  information  should  include  basic 
instruction  on  the  operation  of  the  analysis  system  and 
help  sequences  for  specific  analyses  available  to  the 
student. 

2)  He  can  request  to  return  to  the  compiler  system  in  which 
case  the  OUTPUT  DATA  INTERFACE  module  is  invoked  and 
control  is  returned  to  the  compiler  system. 

3)  He  can  edit  his  proqram  in  which  case  the  STATEMENT  EDITOR 
is  invoked.   (It  is  suqqested  that  only  limited  editinq 
facilities  be  make  available  within  the  analysis  system.) 

4)  He  can  request  an  analysis  (or  collection  of  analyses)  to 
be  performed  in  which  case  the  ANALYSIS  CONTROLLER  module 
is  invoked. 

5)  He  can  request  that  proqram  seqments  and  currently  pending 
messages  be  replotted  on  the  screen. 

Besides  the  general  coordination  and  invokinq  of  the  processinq 
modules,  the  ANALYSIS  10NITOR  is  resonsible  for: 

•  initializinq  the  LATTICE  SPACE  DATA  BASE, 

•  interpretinq  the  results  of  the  specific  qlobal  flow 

analyses,  and 

•  creatinq  and  formattinq  messaqes  to  the  student. 
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INPUT/OOTPUT  UANDLER 

This  module  is  the  interface  between  the  student  and  the 
analysis  system.   It  has  the  following  functions: 

1)  It  accepts  commands  from  the  student's  keyboard. 
?)  It  displays  segments  of  source  program  text  (in 

coordination  with  the  ANALYSIS  MONITOR)  on  the  display 

screen. 
3)  It  iisDlays  messages  and  annotations  produced  as  a  result 

of  th-»  various  global  flow  analyses. 
U)  It  lisplays  text  providing  basic  instructions  for  the 

operation  of  the  analysis  system. 

GRAPH  GENERATOR 

The  GRAPH  GENERATOR  module  scans  the  ANALYSIS  SOURCE  TEXT  to 
produce  the  basic  flow  graph  structure  corresponding  to  the 
student's  source  program.   This  module  must  have  a  basic  knowledge 
of  the  source  languaqe  in  order  to  determine  the  type  of  statement 
to  which  each  node  corresponds  and  to  determine  the  successors  and 
predecessors  of  each  node. 

STRUCTURE  ANALYZER 

^his  module  applies  the  DEST  algorithm  to  the  flow  graph  in 
orrter  to  determine  the  reverse  postorder  of  the  nod°s.   It  also 
qen-^rates  the  call  graph  used  to  coordinate  the  separate  analyses 
of  th°  component  subroutines.   It  is  invoked  immediately  after  a 
s^que-ce  of  editing  functions  and  before  any  further  analysis  is 
performed. 
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STATEMENT  EDITOR 

The  student  should  be  allowed  to  edit  his  proqram  within  the 
control  of  the  analysis  system.   The  editinq  facilities  need  not  he 
significant  but  such  a  facility  should  be  present  so  that  the 
student  need  not  return  to  the  compiler  system  to  perform 
elementary  editinq  functions.   Arbitrary  editinq  facilities  should  t 
probably  not  be  provided  because  this  implies  that  the  knowledqe  of  i 
th°  compiler- translator  would  have  to  be  duplicated  within  the 
analysis  system. 

The  followinq  editinq  functions  are  suqqested: 

•  the  ability  to  delete  a  statement, 

•  the  ability  to  move  a  statement,  and 

•  the  ability  to  have  the  analysis  system  implement  a 

transformation  it  has  suqqested. 
Even  this  limited  editinq  facility  requires  the  STATEMENT  EDITOR  to 
have  some  knowledqe  of  the  source  lanquaqe. 
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As  the  STATEMENT  EDITOR  module  performs  the  editinq  functi 
requested  by  the  student,  the  FLOW  GRAPH  and  correspondi nq  ANALYSIS 
SOURCE  TEXT  must  be  updated  to  reflect  the  editinq  chanqe.  After  a 
sequence  of  editinq  commands  and  before  any  analysis  can  be 
performed,  the  STRUCTURE  ANALYZER  module  must  be  invoked  to  reflect 
the  new  structure  of  the  FLOW  GRAPH. 


II 


it 
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GLOBAL  PLOW  ANALYSIS 

This  module  is  invoked  to  perfon  one  of  the  elementary  qlobal 
flow  analyses;  i.  ? . ,  PROP_ALL,  PR0P_S0I1E,  and  ASSEPT_GEN.   The 
LATTICE  SPACE  DATA  PASE  and  PLOW  GRAPH  are  accessed  and  the  LATTICE 
SPACE  DATA  RASE  in  which  the  lattice  information  correspondinq  to 
th*»  nodes  is  held,  is  modified  durinq  the  execution  of  the 
an^Lvris.   The  ANALYSIS  CONTROLLER  module,  which  invokes  the  GLOBAL 
PLOW  ANALYSIS  module,  must  coordinate  the  separate  elementary 
qlobal  flow  analyses  and  the  data  used  by  them  to  successfully 
comnlet^  the  analysis  requested  by  the  student. 

ANALYSIS  CONTROLLER 

•"his  module  is  invoked  whpn  the  student  requests  an  analysis 
to  be  performed.   All  of  th^  detection  outlines  presented  in 
Chapter  U  are  coded  within  the  ANALYSIS  CONTROLLER  module.   A  qiven 
analysis  may  require  several  invocations  of  the  GLOBAL  FLOW 
ANALYSIS  module  to  complete  the  entire  analysis.   It  is  the  job  of 
*he  ANALYSTS  CONTROLLER  module  to  coordinate  the  data  maintained 
within  thp  LATTICE  SPACE  DATA  RASE  and  to  perform  successive 
invocations  of  the  GLOBAL  PLOW  ANALYSIS  module  in  the  correct 
sequence  +c  complete  the  required  analysis. 

DIRECTIONS  AND  HELP 

This  modulo  is  invoked  when  the  student  asks  for  instructions 
in  how  to  operate  the  analysis  system.   It.  presents  information 
about  the  different  options  available  to  the  student  and  how  to 
be-t  utilize  tho  system. 
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Analysis  Coordination 

Several  basic  elementary  analyses  depend  on  a  specific 
elementary  analysis  that  may  or  may  not  have  already  been  performed 
on  the  current  source  proqram.   For  instance,  the  uninitialized 
variable  analysis  of  Section  4.1.2  requires  that  Analysis  4.1  be 
performed  in  order  to  obtain  VECT0R3'.   If  Analysis  4.1  has  already 
been  performed  on  the  currant  source  proqram,  say  because  an 
unreferenced  lata  analysis  was  previously  requested,  and  the 
results  of  th  3  analysis  have  been  saved,  then  Analysis  4.1  need  not 
be  repeated;  the  previously  saved  results  can  be  used. 


A  simple  mechanism  can  be  employed  to  recoqnize  when  a 
require!  analysis  has  already  been  performed.   With  each  specific 
elementary  qlobal  flow  analysis  a  bit  is  associated,  which  is 
initially  set  to  'O'B.   Whenever  an  elementary  analysis  is 
performed,  its  correspond  inq  bit  is  set  to  M1.   Whenever  an 
edi^-inq  function  is  performed,  all  bits  are  set  to  ,0,B.   Durinq 
the  execution  of  a  detection  outline,  when  a  specific  analysis  is 
to  he  performed,  if  the  correspondinq  bit  is  •  O'B,  then  the 
analysis  must  be  performed;  otherwise  the  analysis  need  not  be 
performed  (and  the  previously  stored  results  can  be  used). 


5.2  Flow  Graph 

The  nodes  and  edqes  of  the  flow  qraph  are  the  basic  data 
structures  upon  which  all  of  the  qlobal  flow  analyses  are  based. 
TV-1  noi»  structure  described  b*»low  constitutes  the  minimal  node 
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structure  required  for  implementation.   The  fields  of  the  nodes  are 
as  follows: 

1)  Typ°  of  node. 

This  field  identifies  the  type  of  source  statement  to  which 
the  node  corresponds  (eq.  ,  assiqnnent  statement,  transfer 
of  control,  or  conditional  test).   The  exact  number  of 
typ°s  is  lanquaqe  dependent  and  a  variety  of  different 
codinqs  can  be  used  to  represent  the  different  types. 

?)  List  of  successors. 

This  field  is  a  pointer  to  a  list  of  successors  of  the 
node.   The  edqes  of  the  flow  qraph  are  thus  encoded  as 
linked  lists  of  nodes.   Additional  information  may  be 
retained  alor.q  with  the  node  information.   For  instance, 
additional  ir.f ormmation  about  the  edqes  is  necessary  to 
qenente  execution  assertions  for  the  different  successors 
of  a  conditional  test  durinq  the  invariant  assertion 
analysis. 

1)  List  of  predecessors. 

U)  Source  pointer. 

A  pointer  must  be  maintained  to  identify  the  source 
statement  to  which  the  node  corresponds.   In  most 
implementations,  this  will  point  to  an  intermediate  text 
form  of  the  source  statement  (assuminq  that  the  content  and 
position  of  the  actual  source  statement  can  be  recovered 
from  this) . 

S)  Reverse  postorder. 

The  orderinq  determined  by  the  DFST  alqorithm  is  held  in 
this  fi-^ld.   For  efficient  execution  an  array,  which 
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encodes  this  same  information,  is  also  maintained  so  that 
nodes  can  be  processed  quickly  in  the  desired  order. 

6)  Pointer  to  lattice  element. 
This  is  a  pointer  the  the  particular  lattice  element 
currently  attached  to  the  node.   In  the  case  of  property  P 
propagation,  it  is  a  pointer  to  an  M-bit  vector.   In  the 
case  of  invariant  assertion  analysis,  it  is  a  pointer  to  an 
assertion. 

7)  Flow  function  pointer. 

The  evaluation  of  the  flow  function  is  based  on  information 
associated  with  the  node.   In  the  case  of  property  P 
propagation,  two  pointers  are  required  to  point  to  the 
vectors  that  encode  source  and  sink  information.   In  the 
case  of  invariant  assertion  analysis,  a  list  of  pointers  is 
required,  each  of  which  points  to  an  expression  tree  of  an 
expression  in  the  statement. 

8)  Source  order  pointer. 

The   order    in    which    the  oriqinal   source    text    was    presented 
by    the    student    should    be    encoded    within   the    flow    qraph. 
This    information    is   required    when    the    student    is    editinq 
his   proqram.       For    instance,    if    the   student    deletes    a    GOTO 
statement,    then   the    flow    qraph    must    be    modified    to    reflect 
the    new    flow    of   control,    which    depends   on    the    oriqinal 
order    of    the    source    text. 
The    pdq^s    of    the    flow    qraph    can    be    implicitly    specified    at    the 
noler,    by    maint^ininq    a    list    of    successors   and    predecessors    at    each 
node. 
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InJllCfi  Nodes 

^hp  concepts  presented  in  this  thesis  address  only  basic 
statements  (i.  Q. ,  I/O,  assiqnrapnt,  subroutine  invocation,  and 
transfer  of  control) .   Por  these  concepts  to  be  applied  within  a 
laniuaqe  contiininq  hiqh  level  constructs  (such  as  DO  loops,  WHILE 
Loops,  etc.),  =»ach  hiqh  level  construct  should  induce  a  set  of 
nodes  correspondi nq  to  its  implied  low  level  constructs.   Por 
instance,  a  PORTRAN  DO  statement  induces  a  node  that  assiqns  the 
valup  of  the  first  parameter  to  the  index  variable  of  the  DO  loop; 
th°  correpondinq  CONTINUE  statement  induces  a  node  that  increments 
the  iniex  variable  of  the  DO  loop  and  a  second  node  that  compares 
th<>  index  variable  to  the  upper  bound. 

Such  an  implementation  requires  the  addition  of  four  more 
fields  to  the  basic  node  data  structure.   These  fields  are  required 
in  order  to  communicate  with  the  user  in  his  own  terms.   The  user 
knows  nothinq  about  the  underlyinq  flow  qraph  framework  that  the 
analysis  system  is  usinq  to  analyze  his  proqram.   If  the  analysis 
system  finds  an  anomaly  within  an  induced  node,  the  system  must 
reference  the  node  that  induced  it  when  communicatinq  with  the 
iser.   Conversely,  when  the  user  edits  his  proqram,  the  system  must 
know  which  in iuced  nodes  are  affected.   The  new  fields  of  the  nodes 

9)  Tnducii  node. 

This  is  a  Boolean  flaq  indicatinq  whether  or  not  the  node 
is  an  induced  node. 
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10)  Pointer  to  the  iniucinq  node. 

If  the  node  is  an  induced  node,  then  this  is  a  pointer  to 
the  node  that  induced  it. 

11)  Cosmetic  node. 

This  is  a  Boolean  flaq  indicating  that  the  node  is  present 
for  cosmetic  purposes  only  and  is  to  be  iqnored  durinq  the 
analysis.   A  node  that  induces  a  set  of  nodes  becomes  a 
cosmetic  node. 

12)  List  of  induced  nodes. 
If  the  node  induces  a  set  of  nodes,  this  list  encodes  the 
s«=>t  of  nodes  induced. 

The  set  of  source  statements  that  induce  nodes  (and  the  type  of 
noie  induced)  is  lanquaqe  dependent  since  hiqh  level  constructs 
vary  between  lanquaqes. 

This  concept  of  induced  nodes  can  be  applied  to  many  differe:: 
aspects  of  the  lanquaqe  (dependinq  on  its  complexity) .   For 
instance,  a  conditional  statement  that  tests  a  compound  condition 
should  induce  a  set  of  nodes  each  of  which  tests  only  a  simple 
condition.   If  this  loqic  is  employed,  the  job  of  the  invariant 
assertion  analysis  is  made  much  easier  because  only  nodes  with 
simple  conditional  ^ests  need  be  addressed.   Induced  no<3es  should 
also  be  employed  with  multiple  or  embedded  assignment  statements. 
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5.3  Global  Plow  Alqorithm 

The  alobal  flow  algorithms  can  be  easily  implemented  in  any 
lanquaqe  in  which  linked  lists  are  easily  implemented  and 
individual  bits  within  a  word  are  easily  accessible.   This  section 
addresses  the  implementation  of  specific  aspects  of  the  qlobal  flow 
alqor  ithms . 


FLOW!  (G,e,in) 

£0I   §.1&L1   Q  e    G 
a. lat    <-    J; 
STEP1  : 

q. mark    <-    ' 1 'B; 

*».  lat    <-    in; 
STKP2: 

do    yJlii£    (rq.mark    =    M'B    for    some    q   €    G]) 
for    1    «-    1    to    |S| 
if   <?r  11«  nark    then 

qr  11. mark    <-~T0«B; 
STEP3:  for   every,    s   e   succ(q[1"|) 

t    <-    f  (or  1l,s,q[  H.lat)    ft    s.lat; 
if    t    <    s.lat    then 
s.lat    <-   t; 
s.mark    <-    M'B; 

fi; 

ro  f ; 


li; 


rgf ; 
od  ; 
END  PL0W3 


Piqure  5.2 
SPECIFICATION  OP  IMPROVED  ALGORITHM 


3a_rkin-7  The  Nodes 


Nodes  of  the  flow  qraph  are  marked  (q.mark)  to  indicate  that 
th^v  are  to  be  processed.   Markinq  can  be  accomplished  by 
associatinq  one  bit  with  each  node  and  holdinq  these  bits 
conf i auously  in  memory.   For  this  discussion,  assume  that  the 
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number  of  nodes  is  less  than  or  equal  to  the  number  of  bits  in  a 
word  so  that  all  of  the  bits  can  be  held  in  one  single  word  of 
memory,  MARK.   Then  the  assiqnment  statement  at  STEP1  of  Fiqure  5.2 
(a  copy  of  Fiqure  3.4)  can  be  implemented  by  one  assiqnment 
statement  that  sets  all  of  the  bits  of  MARK  to  •  1'B.   The 
"do  while"  statement  at  STEP2  can  be  implemented  by  a  sinqle  test 
that  determines  if  MARK  is  zero. 

Successors 

A    "for  every    s   e    succ(q)"   statement    (see    STEP3    of    Fiqure    5.2) 
can   be    implemented    by    holdinq    the    successors   of   q    in   the    form    of  a 
linked    list  and   successively    raovinq   down   the   links   of   this   list   to 
obtain    the   next   successor.      In   PFOP_ALL   and    PROp_SOME,    the   input 
parameter    X   determines   whether   successors    or    predecessors  are   to  be 
used    to    move    throuqh   the    flow    qraph.       This   is    easily    implemented   by 
sel^ctinq   as   thp    initial   link    either   the    pointer   to   the    list   of 
successors    or    the    pointer    to    the    list    of    predecessors. 


5.  14    Vectors 


The  power  of  the  bit  propaqation  techniques  presented  in 
S^c-.ion  1.2.1  lies  in  the  fact  that  bit  vectors  can  be  employed  to 
perform  the  analysis  on  many  properties  in  parallel.   Clearly,  the 
bits  should  be  hell  contiquously  in  memory  so  that  machine  bitwise 
AND  and  OP  operations  can  be  applied  to  the  bit  vectors  a  word  at 
t  im^ . 
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Por  thos-}  bit  vfctors  that  involve  variables,  the 
correspondanc?  between  the  individual  variable  and  its  bit  position 
in  the  vector  can  be  determined  by  some  natural  orderinq,  say 
alphabetical  or  by  position  in  the  symbol  table.   Once  this 
correspondence  is  established,  it  is  used  for  all  vectors  involvinq 
vari^blos. 

YfS^or  f anaqement 

^°cause  many  elementary  global  flew  analyses  must  be 
performed,  some  mechanism  for  raanaqinq  the  separate  vector  sets 
(that  must  be  retained  for  further  use)  must  be  developed.   The 
followinq  typ^»  of  manaqement  system  is  suqqested. 

*\ach  vector  set,  VEC,  that  must  be  retained  for  further  use 
should  be  allocatel  ^  partition,  P(VEC)#  of  memory  (possibly  some 
form  of  auxiliary  memory).   An  area  of  immediately  accessible 
■emorv  should  be  reserved  for  the  vector  sets  (sources,  sinks,  and 
lat*-ica  elements)  beinq  used  in  the  current  elementary  q  loba  1  flow 
analysis.  A,  beinq  performed.   In  the  initialization  of  analysis  A, 
iny  vector,  V^d,  used  in  the  analysis  can  be  transferred  into 
immediately  accessible  memory  from  P(VEC1)  (with  a  block  transfer 
instruction,  if  available).   Upon  completion  of  analysis  A,  any 
vector,  VLC2,  produced  by  the  analysis  that  should  be  retained  can 
be  transferred  to  P(VEC2).   As  vector  sets  are  transferred  in*o 
immediately  accessible  memory,  pointers  (field  7)  within  the  nodes 
will  require  modification. 
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5.5  I mplementation  Of  Assertions 

A  sinqle  data  structure  can  be  employed  for  all  forms  of 
elementary  assertions  (as  specified  in  Section  3.2.2.1).   Recall 
that  a  <rel  assert>  has  the  form  <var><rel><const> ;  all  other 
elementary  assertions  have  three  or  less  fields.   The  uniform  data 
structure  contains  the  followinq  fields. 

1)  Pointer  to  the  variable. 

Depenling  on  the  implementation  of  the  compiler,  this  woul 
probably  be  a  pointer  to  the  symbol  table  entry  of  the 
variable. 

2)  Pointer  to  the  constant. 
Probably  a  pointer  to  the  symbol  table  entry  of  the 
constant. 

31  Relation. 

A    three   bit   code   can    be   used    to    determine   the    relation 

between    the    variable   and    the   constant. 
4)    Type    of   assertior. 

A    thr^e    bit    cod<*   can   be   used    to   identify    whether   the 

assertion    is: 

•  a  <rel  assert>, 

•  a  (new  assert>, 

•  a  <pass  assert), 

•  a  <no  assert>,  or 

•  a  <delta  assprt>. 
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For  efficiency  of  processinq,  the  conjuctive  normal  form  of 
the  assertion  can  be  held  in  a  canonical  linked  list  of  the 
folLowinq  for™. 

rach  dis-juction  is  held  in  the  form  of  a  linked  list.   Within 
the  list,  <rel  assert>s  are  held  in  order  sorted  by 
flajor:  pointer  to  variable 
Intermediate;  pointer  to  constant 
Minor:  relation 
•"he  coniuction  of  dis-junctions  is  also  held  in  the  form  of  a 
linked  list,  where  each  conjunct  is  held  in  order  sorted  by 
♦■he  first  member  of  the  dis-junctions. 
Thi^  orderinq  can  be  used  to  efficiently  perform  searches  throuqh 
♦  h'*  assertions  when  assertions  are  beinq  combined  or  simplified. 
Por  instance,  assume  we  wish  to  perform  simplifications  on  basic 
assar*ions  involvinq  the  variable  X.   Since  the  assertions  in  each 
lisluction  are  sorted,  all  basic  assertions  involvinq  X  will  be 
iro'ipel  into  *  reqion  of  the  linked  list.   In  searchinq  down  each 
linked  list,  once  this  reqion  has  been  processed  the  list  need  not 
h»  searched  farther. 

mhis  orderinq  can  be  efficiently  implemented  by  holdinq  the 
fields  of  the  assertion  contiquously  in  memory  (ma"jor,  int.,  minor) 
anl  ♦reatinq  th»  assertion  as  the  numerical  equivalent  of  this  bit 
Itrinq  when  d ^t^ rmi ninq  its  position  in  a  linked  list.   In  this  way 
the  thro*1  pertinent  fields  of  the  assertion  need  not  be  addressed 
iPo^r at. ely.   If  course,  an  actual  sort  need  n°ver  be  performed  on 
^ny  linked  list;  when  a  basic  assertion  is  to  be  inserted  into  a 
linked  list  it  can  b=»  inserted  into  its  properly  ordered  position. 
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6  SUMMARY 


A  basic  flow  qraph  structure  has  been  presented,  and 
improvements  to  the  existinq  iterative  qlobal  flow  techniques  of 
Kildall  and  Kam  &  Ullman  have  been  developed.   A  number  of 
proqraraminq  anomalies  commonly  found  in  students1  proqrams  have 
been  identified,  and  the  improved  iterative  qlobal  flow  techniques 
have  been  applied  to  the  problem  of  detectinq  specific  instances  of 
these  anomalies.   The  detection  is  performed  without,  any  knowledqe 
of  the  alqorithm  the  student  is  attemptinq  to  implement. 


The  improved  form  of  Kara  K   ullman's  alqorithm  has  several 
desirable  properties.   Within  each  pass  of  the  alqorithm  throuqh 
the  nodes  of  the  flow  qraph,  nodes  in  reqions  of  thp  qraph  where 
lattice  information  has  stabilized  are  not  processed.   Thus, 
unnecessary  processinq,  which  sometimes  occurs  in  Kam  &  Ullman' s 
alqorithm,  is  eliminated.   Because  of  the  way  in  which  nodes  are 
marked  in  order  to  determine  whether  a  node  must  be  processed,  the 
improved  alqorithm  normally  converqes  in  one  less  pass  over  the 
noles  than  Kam  S  nilman's  alqorithm  and  always  converqes  in  no  more 
than  d(G,T)  ♦  1  passes  over  the  nodes  of  the  flow  qraph.   The 
combination  of    these  two  properties  means  that  the  improved 
alqorithm  in  qeneral  processes  fewer  nodes;  in  many  cases  as  much 
as  50  percent  fewer  nodes. 


Th?  improved  iterative  qlobal  flow  alqorithm  has  been  applied 
within  throo  listinctly  different  frameworks. 
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1)  Property  P  propaqation  supplies  an  efficient  framework  into 
which  many  of  the  classical  qlobal  flow  analyses  fall. 
This  fraaework  uses  bit  vectors  and  bitwise  operations  to 
propagate  information  through  the  flow  qraph.   Property  P 
propaqation  has  been  applied  to  problems  involvinq  data 
flow  md  proqram  structure. 

2)  A  call  qraph,  which  encodes  the  callinq  dependency  between 
subroutines,  has  been  developed,  and  a  semilattice 
structure  has  been  superimposed  upon  it.   This  framework  is 
used  to  transmit,  information  between  subroutines.   Explicit 
and  implicit  recursive  subroutine  calls  do  not  constitute  a 
special  case  and  ar*3  easily  handled  within  this  framework. 

^)  An  invariant  assertion  analysis  has  been  developed  in  which 
information  is  collected  about  the  value  of  variables. 
This  framework  has  been  applied  to  the  detection  of 
anomalies  involvinq  the  value  of  variables. 

A  system  desiqn  incorporatin q  these  techniques  has  been 
presented  alonq  with  a  basic  philosophy  to  be  applied  when 
interacting  with  the  student. 


*>.  1  Conclusions 

Iterative    qlobal    flow    techniques    are   a    powerful   tool    for 
iet^ctinq    «*   viriety    of   anomalies    in    proqrams,    ranqinq    from    data 
clow    problems    to    proqram    structure. 
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Detection  schemes,  which  use  the  three  basic  frameworks  that 
have  been  developed  in  this  thesis,  can  be  incorporated  into  an 
"intelligent"  interactive  compiler  system  capable  of  advising  the 
student  about  specific  anomlies  present  in  his  program.   In  this 
way,  the  student* s  attention  can  be  directed  to  regions  of  his 
program  where  logic  errors  may  be  present. 


6.2  Other  Applications 


The  technigues  developed  in  this  thesis  have  obvious 
applications  in  compiler  optimization.   Many  of  the  technigues  usee 
in  the  detection  schemes  are  taken  directly  from  the  compiler 
optimization  discipline  (»g.#  live  variable  analysis  and  common 
subexpression  detection) .   The  improved  algorithm  is  applicable  to 
th°  same  range  of  problems  as  the  original  algorithm  presented  by 
Kan  &  Ullman. 


Affiliations  of,  £roj>pr_t;y  £  P.£<2£§.gation 

Sevpral  of  the  specific  anomalies  to  which  property  P 
propagation  has  been  applied  have  a  direct  use  in  compiler 
optimization.   Three  specific  examples  are  presented  here. 


Unreferenced  data  constitutes  a  situation  in  which  a  complete 
assignment  statement  can  be  eliminated  from  the  source  program. 
t  *rms  of  object  code  the  deletion  of  such  an  assignment  statement 
illows  the  elimination  of  the  evaluation  of  an  expression  and  the 
iinrnf-nt  of  this  value  to  a  variable.   The  elimination  of  the 
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evaluation    of    the    expression    can    only    be    performed    if    there   are   no 
side   effects    associated   with    its    evaluation.       The   elimination    of 
such    a    source    statement    may    uncover   more    unreferenced    data. 

^he    elimination    of   certain    forms    of    transfer   variables    can 
produce    a    small   code    improvement.       Each   transfer    variable    that    can 
be   eliminated    allows    the    deletion    of    one    load    and    one    store 
instruction    (assuminq    a    reqister    machine)    and    frees    one    word    of 
memory . 

The    input/output    status    of   subroutine    parameters    can   be    used 
to    letermine    the    tvpe    of    code    qenerated.      Clearly,    if    a    qiven 
parameter    is    neither  an    input    variable    nor   an    output    variable,    it 
need    not    be   addressed    in    th^    subroutine    linkaqe.       This    saves    both 
execution    time    and    space    (both    data   and    proqram    space). 

Property    P    propaqation   can    be  applied   to   a   variety    of   other 

compiler   optimization    problems.       For    instance,    consider    the 

followinq    problem: 

Assume    we    have    a    compiler    that    will    perform   an    execution-time 
test    to    ch^ck    if    variables   have    been    initialized.        (Two    such 
compilers    that   have   this    feature    are    PL/C    and    WATFIV.)       k 
naive   approach    is    to   include    the    run-time    test    for    each 
refprenc=>    to    ev»ry    variable,    but    this    is   very   expensive    in 
both    tim-»    and    space.      clearly,    execution-time    tests    need    not 
be    incluied    at    reference    points    of   a    variable,    V,    that    are 
dominatel    by   assiqn    points   of    V.      Property   P    propaqation    can 
bf>    used    to    identify    such    reference   points. 

The    reader    should    havp    no    trouble    desiqninq    a    detection    scheme    for 

this    problem. 
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AE£li£§£i2IL§  Qf  Iav§Ci3H£  Assertion  Analysis 

Invariant,  assertion  analysis  has  an  obvious  application  to  the 
general  problems  of  assertion  qeneration  and  program  proving.   By 
itself,  it  is  not  a  program  prover,  but  it  may  prove  to  be  a  useful 
tool  in  the  development  of  a  program  proving  system. 

Invariant  assertion  analysis  would  have  an  interesting 
application  within  a  compiler  in  which  the  ability  to  make 
assertions  about  variables  of  the  program  is  an  integral  part  of 
the  source  language  (riOV76  1).   Such  a  feature  can  be  naturally 
incorporated  into  the  framework  of  invariant  assertion  analysis. 
An  in-line  source  assertion.  A,  can  be  considered  to  be  an  ordinar 
source  statement  whose  execution  assertion  is  A. 

Invariant  assertion  analysis  has  an  interesting  application  i 
compiler  optimization.   It  can  determine  certain  regions  of  the 
source  program  that  cannot  be  reached  by  any  execution  path.   Thes 
regions  can  be  deleted  from  the  flow  graph  and  need  not  be 
considered  in  further  optimization  considerations.   Object  code 
need  not  be  generated  for  such  regions.   As  presented  in  Section 
4.2.U,  uniformly  true  or  false  conditional  tests  can  be  detected. 
For  such  conditional  tests,  the  appropriate  action  is 
predetermined,  and  the  actual  test  can  be  deleted. 

Thp  particular  realization  of  invariant  assertion  analysis 
prespnted  in  this  thesis  deals  with  assertions  that  encode 
inf  ->rmat-  ion  about  the  value  of  variables,  but  other  forms  of 
invariant  assertion  analysis  miqht  utilize  assertions  that  encode 
lifforant  information,  say  information  about  the  ty_£e  of  the 
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variable.   As  a  concrete  example  of  such  an  application,  consider 
an  array  lanquaqe,  such  as  0L2  (fPHl72"|),  which  has  as  its  ma-jor 
lata  *ype  different  forms  of  arrays:  rectangular,  lower  trianqular, 
liaqonal,  tridiaqonal,  Hessenburq,  etc. 

Consider  the  two  proqram  segments  in  Fiqures  6.1  and  6.2. 


51  M  <-  LT; 

52  for   i  <-  1  to  n 

53  H  <-  M*DIAG; 

SH  H  <-  M  ♦  LT; 

S5  rof; 

Figure  6.1 

1  IS  *  LOWER  TRIANGULAR  MATRIX 


56  M  <-  LT; 

57  for   i  <-    1  to  n 

SB        M  «-  M*DIAG; 

S9        M  «-  M  ♦  R; 

S10    rof; 

Fiqure  6.2 

M  IS  A  RECTANGULAR  MATRIX 


*ss'iip  ♦-he  followinq  declarations  for  the  variables: 

•  M    is  an    n    by   n    r^ctanqular    matrix, 

•  LT    is    an    n    by    n    lower    trianqular   matrix. 
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•  DIAG  is  an  n  by  n  diaqonal  natrix,  and 

•  R  is  an  n  by  n  rectanqular  matrix. 

In  these  fiqures,  ♦  and  *  represent  matrix  addition  and 
Multiplication,  respectively.   Note  that  the  only  difference 
between  the  two  code  sequent s  occurs  at  S4  and  S9;  in  Figure  6.1  a 
lower  triangular  matrix  is  added  to  H,  and  in  Pigure  6.2  a 
rectangular  matrix  is  added  to  M.   Upon  exit  from  the  for.  loop  in 
Figure  6.  1  M  is,  in  fact,  a  lower  triangular  matrix;  i.  e.,  all  the 
entries  above  the  main  diagonal  are  zero.   Upon  exit  from  the  fop. 
loop  in  Figure  6.2  fl  is  a  full  rectangular  matrix. 

A   realization  of  invariant  assertion  analysis,  which  uses 
assertions  that  encode  information  about  array  types,  can  be 
developed  to  determine  the  maximum  size  required  for  an  array. 
When  this  maximum  required  size  is  found  to  be  less  than  the 
declared  size,  the  analysis  system  can  automatically  apply  the 
restricted  attributes  to  the  array.   Such  an  application  of 
invariant  assertion  analysis  constitutes  an  optimization  in  both 
time  and  space. 


6.  3  Further  Work 


The  anomalies  presented  in  Chapter  2  are  representative 
examples  of  the  type  of  anomaly  present  in  student's  programs,  and 
in  no  way  exhaust  the  types  or  forms  of  anomalies  that  occur.   A 
large  set  of  similar  anomalies  can  be  identified,  and  detection 
3chem»s  can  be  developed  for  them.   Because  there  is  such  a  large 


171 


set  of  such  anomalies,  attention  should  be  directed  toward  those 
anomalies  that  most  frequently  occur  in  students'  proqrams.   This 
lay  turn  out  to  be  lanquaqe  dependent.   The  analysis  of  the  machine 
problems  aentionei  in  Chapter  1  is  an  initial  approach  to  this 
problem . 

A  basic  inefficiency  inherent  in  the  detection  schemes  is 
caused  by  the  fact  that  whenever  any  editinq  is  performed  by  the 
3tu1ent,  all  previously  determined  information  is  discarded  and  the 
analysis  beqins  aqain  with  the  DFST  alqorithm.   This  is  inefficient 
because  the  information  associated  with  the  previous  proqram 
structure  may  be  useful  in  deterraininq  information  about  the  new 
proqram  structure.   Some  work  has  been  done  on  this  problem  in  the 
area  of  interval  qlobal  flow  analysis  (rGIL76U]).   Similar 
techniques  should  be  developed  in  the  area  of  iterative  qlobal  flow 
analysis  . 


5.3.1  Error  And  Anomaly  Evaluation 

This    thesis   has   developed    detection    schemes    for    findinq 
specific    instances    of    a    number    of    proqram    anomalies   commonly    founi 
in    student's    proqrams.      The    qeneral    philosophy    of    the    thesis    has 
be°n    that    once    the    anomaly   has    been    brouqht    to   the    student's 
attention    it    becomes    the    student's   responsibility    to    determine    the 
<>rnr(    if    any,    that   caused    the    ancmaly.      This    is    a   reasonable 
approach    and    constitutes    a    well    balanced    man/machine    problem 
solvirq    team.       Put    in    some    situations    the    student    will    simply    not 
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be  able  to  determine  the  loqic  error  that  induced  the  anomaly. 
Thus,  a  more  helpful  system  would  be  able  to,  at  least  in  some 
cases,  make  a  reasonable  "quess"  as  to  what  the  underlying  loqic 
error  is.   The  discussion  below  presents  two  examples  of  how  the 
system  miqht  infer  such  information  once  unreferenced  data  has  bee 
detected . 

In  Section  2.1  several  examples  were  presented  that  associate 
specific:  instances  of  unrefprenced  data  with  assumed  loqic  errors. 
Consider  the  code  sequence 

I  =  1 

DO  10  I  =  1,20 
in  which  unreferenced  data  occurs  because  the  index  variable  is 
initialized  outside  the  DO  loop.   Such  a  situation  can  be  detecte: 
by  determining  whether  the  one  and  only  "dereference"  point  of  thf 
unreferenced  assiqnment  statement  is  associated  with  the  entry  of 
DO  loop.   when  such  a  conf iquration  is  detected,  a  very  specific 
statement  about  the  initialization  that  occurs  at  a  DO  statement 
and  the  fact  that  the  index  of  the  DO  loop  need  not  be  initialize' 
outside  the  loop  can  be  presented  to  the  student. 

As  a  second  example  consider  the  code  sequence 
SUM  =  0 
DO  10  I  =  1,5 
10  SUMM  =  SUM  ♦  A(I) 
in  which  two  actual  variables  (SUMM  and  SUM)  have  been  used  for  oe 
logical  variable.   In  many  cases  of  this  type  of  error,  the 
occurence  of  unreferenced  lata  is  accompanied  by  a  correspondinq 
unini* i^ liz«3  variablp  anomaly  (in  this  case,  SUM  is  uni nitialize 
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at   <;tatPiPnt     10),    and    the    two    variable    names   associated    with    the 
♦•wo   spparate    anomalies    are    normally   very    "similar."         If    a   measure 
can    be    developed    to    determine    how    "close"    two    variable    names   are 
(T  i  ")R 70  1>  #    then   a    specific    unreferenced   data    anomaly    and    a    specific 
uninitialized    variable   anomaly   may    be    found    to   correspond   to    one 
another.       In    such    a    situation    the    system    can    suqqest    to    the    student 
tha*    he    may    have    used    two    different    variable    names    for    one    loqical 
variable. 

The    two    examples    presented   above    are    special  cases    in    which 
thr    student   can   be    informed    about    a   possible    underlying    loqic 
prror.       Further    work    should    be   done    to   determine    the 
characteristics    of    anomalies   associated    with    specific   types   of 
lo^ic    errors.       The    analysis    of    the    proqram   sets   mentioned   in 
:hdDtpr    1    is  a    step   in    this   direction.      The    outcome    of    such    an 
analysis    will    probably    identify   a    larqe   number    of    special   cases 
♦"hat    must    be   detected     (as    indicated    by    the   two  examples   qiven 
above).       Such    a   set    of    special   cases    is   undesirable    because    the 
detection    of    ^ach    case   will    probably    require    separate   code. 
Purth^r    work    should    be   done    to    uncover   a    uniform   approach    for 
^orrelatinq   anomalies   and    their    underlyinq    loqic   errors. 

ItiH5fiI    Variables 

Further    work    must    be    done    to    identify   what   specific    instances 
of    transfer   variables    should    be   brouqht   to    the    student's   attention. 
Th<»    nurpose   of    detect  inq   transfer    variables   is    to   detect    code 
sequences    such    as 
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B  =  A 

A  =  B  ♦  1 
(where  B  is  not  referenced  further  in  the  program).   Unfortunately, 
transfer  variables  (as  defined  in  Section  2.8)  are  associated  with 
other  useful  proqramminq  techniques.   Three  examples  of  transfer 
variables  that  are  associated  with  useful  programming  techniques 
are  presented  below,  but  further  work  must  be  done  to  identify 
others. 

1)  Evaluation  of  an  invariant  expression  outside  a  loop  — 
Given  in  invariant  expression,  X,  that  is  referenced  withi 
a  loop,  it  is  a  common  practice  to  assign  the  value  of  X  t< 
a  temporary  variable,  V#  outside  the  loop  and  to  reference 
the  value  of  V  inside  the  loop.   Such  a  situation 
constitutes  a  transfer  variable,  but  is  useful  for 
producing  efficient  execution.   This  form  of  a  transfer 
variable  should  not  be  brought  to  the  student* s  attention. 

2)  Multiole  computation  of  the  same  expression  -- 
A  transfer  variable  is  often  used  to  transfer  the  value  ol 
an  expression  to  sever.a.1  different  positions  wi+hin  the 
proqnm.   Such  a  use  of  a  transfer  variable  may  save 
execution  time  in  that  the  value  of  the  expression  need 
only  be  evaluated  once.   Such  a  situation  is  clearly  a 
useful  application  of  a  transfer  variable  and  should  not 
brought  to  the  student's  attention. 

3)  Initialization  -- 
A  standard  proqramminq  technique  is  to  initialize  the  vali 
of  a  variable  to  a  constant  value  and  then  reference  the 
variable  instead  of  the  constant.   Such  a  technique  is 
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useful    in    parameterizinq    a    proqram   and    should    not   be 

brouqht    to    the    stuient's    attention. 
Bach    of    the    three    forms   of    a    transfer    variable    presented    above   can 
be   automat ica lly    identified    so   that    it    is    not    brouqht    to    the 
stuient's    attention. 


*>.1.2  Heuristics  Based  On  Experience 

The    anomalies    presented    in   Chapter    2   constitute   clear-cut 
proiram    constructs    that    should    be   brouqht    to    the    student's 
att^n^ion.      The    reason    that    these    anomalies  can   be   detected    in    a 
straiqht f orwa rd    way    (as  evidenced    in    Chapter    4)    is   that    their 
lefction   does   not   depend    on    the    semantics  of    the    variables    used    in 
the   proqram.       There    are,    however,    proqram    constructs    that 
constitute    anomalies    dependent    on    the    semantics  associated    with    the 
variables    involved.        (In    fact,    transfer    variables  constitute    such    a 
construct.)       The    techniques    developed    in    Chapter    4    can    be   used    as 
tools    in    the    development   of    detection    schemes    for 
semantic-dependent    anomalies    also.       As   an    example   of    such   a 
semantic-dependent    anomaly    consider    the    situation    in    which    the    loop 
initialization    has    been    placed    inside    the    loop. 

Initialization   Inside    Loop. 

Beqinninq    proqrammers    will    often    place   the    initialization    of   a 
loop    inside  the   loop   itself.       Por    instance,    consider    the    proqram 
seqient    in    Piqure    5.3.      The    label    "20"   at    S2    should   clearly    be 


g! 

it 
t 
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51  1=1 

52  20  SUM  =  0 

53  SUM  =  SUM  ♦  A(I) 
SH            1  =  1*1 

S5     IF(I.LE.IO)  GOTO  20 

Fiqure  6.3 
INITIALIZATION  INSIDE  LOOP 

moved  down  to  S3.   The  following  heuristic  miqht  be  employed  for 

detecting  such  a  situation. 

The  assignment  of  a  variable  V  at  a  statement  S  potentially 
constitutes  a  loop  initialization  inside  a  loop  L  if: 

•  v  assigned  at  S  is  a  transfer  variable, 

•  the  value  of  the  expression  assigned  to  V  at  S  is 

invariant  to  loop  L,  and 

•  the  set  of  statements  inside  loop  L  at  which  V  is 

referenced  (for  which  the  assignment  at  S  is  the 
source  of  the  data)  are  all  at  the  same  loop  nesting 
level. 

Of  course,  more  restrictive  conditions  can  be  added  to  the 

three  presented  above. 

Note  that  the  detection  of  such  a  construct  does  not  guarantee 
that  a  loop  initialization  has  been  found  inside  the  loop,  but  the 
probability  is  high  that  such  an  initialization  is  present.   The 
possibility  that  he  has  placed  a  loop  initialization  inside  the 
loop  can  be  suggested  to  the  student. 
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6.1.3  Extensions  of  Invariant  Assertion  Analysis 

As  discusspd  in  Section  3.2.2.2  extensions  can  be  made  to  the 
realization  of  invariant  assertion  analysis  presented  in  this 
thesis.   purth°r  work  should  be  done  to  develop  more  powerful  forms 
of  invariant  assertion  analysis. 

An  interestinq  extension  of  invariant  assertion  analysis  would 
h»  *o  embed  M+race"  information  within  the  basic  assertions 
collected  at  each  node  of  the  flow  qraph  in  which  information  about 
th°  source_  of  the  assertion  is  maintained.   For  instance,  qiven 
that  a  basic  assertion  f  T> 1 1  is  attached  to  node  n,  it  may  be  of 
interest  to  the  stuient  to  know  whv.  this  assertion  is  true  at  node 
t.   Tf  pointers  relating  the  basic  assertion  back  to  the  nodes  that 
contributed  information  to  th°  assertion  are  maintained,  then  this 
question  can  be  at  lpast  partially  answered. 

6.3.4  Arrays 

Subscripted  variables  have  not  been  addressed  in  this  thesis 
a*  all.  When  a  subscripted  variable  is  encountered  in  the  source 
t«xt,    it    is  essentially   ianored.       Array   references    do    not,    for 

nance,    have    correspondinq    bit    positions    in    VECT0R2    and   VECT0R3 
as   pros°n*ed    in    Chapter   U.       in    order   to   explain    why    arrays    have 
^een    essentially    iqnored,    a    qeneral   discussion    on   arrays   and    two 
SDacific    problems   are    presented    below.       Arrays    are    an    inteqral    part 
of    all    hiqh    level    proqramminq    lanquaqes   and    their   omission 
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constitutes  an  important  unresolved  gap  in  this  thesis.   At  the  end 
of  this  section  two  different  approaches,  which  supply  partial 
solutions  to  the  problem  of  handlinq  arrays,  are  discussed,  but 
further  work  must  be  done  in  this  area. 

Consider  the  code  sequence  shown  in  Figure  6.4.   Four  explicit. 


S1        A(T1)  =  3 


52  A  (12)  =  B  +  C 

53  D  =  A  (13) 

54  A  (14)  =  5 

Figure  6.4 
IDENTIFYING  ARRAY  REFERENCES 

array  references  are  shown.   The  problem  is  to  determine  which  of 
the  array  references  are  references  to  the  same  element  of  the 
array  A.   This,  of  course,  depends  on  the  values  of  the  index 
variables.   The  example  given  here  is  a  very  simple  one  if  we 
assume,  for  instance,  that  there  are  no  loops  involved  and  that  the 
index  variables  used  are  integer  variables.   The  problem  becomes 
much  iorf  complex  if  these  assumptions  do  not  hold.   For  instance, 
if  a  loop  is  involved,  the  index  variables  of  the  array  references 
nay  depend  on  the  induction  variable  of  the  loop,  and,  thus,  the 
identification  of  identical  array  element  references  may  depend  on 
previous  executions  of  the  loop;  i.  e.,  A  (13)  referenced  at  S3  may 
b~  the  same  array  element  as  A(T1)  assigned  at  S1  three  iterations 
of  *-.h^  loop  aqo.   If  real  variables  are  allowed  as  index  variables, 
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th»n  two  array  references  lay  reference  the  sane  array  element  even 
though  thp  correspond ing  index  variables  do  not  have  the  same  value 
(since  the  real  values  arp  truncated  to  determine  integral  indices 
into  the  array) .   Tn  order  to  clarify  the  problem  further,  consider 
th>  two  anomalies  of  unreferenced  data  and  uninitialized  variables. 

UHI§£ ££§nced  2<±1* 

In  order  to  perform  live  variable  analysis,  references  to 
identical  array  elements  must  be  identified.   For  instance,  if 
A  (11)  (at-  S1)  and  A  (12)  (an  S2)  reference  the  same  array  element 
ani  there  are  no  refprencps  to  array  A  between  S1  and  S2 ,  then 
A(T1)  is  dead  at  S1.   On  the  other  hand,  if  A(I1)  (at  S1)  and  A  (1-3) 
(at  S3)  reference  the  same  array  element,  but  A  (12)  (at  S2)  does 
not,  *hpn  A(I1)  is  live  at  S1. 

IlniniiialiZ^l  Variables 

Tn  order  to  perform  an  uninitialized  variable  analysis, 
identical  array  element  references  must  be  identified.   For 
instance,  consider  the  reference  to  A  (13)  at  S3.   In  order  to 
determine  whether  A  (13)  has  been  previously  assigned,  all 
assignments  to  array  A  within  ancestors  of  S3  must  be  identified  as 
to  which  plements  of  A  have  been  assigned. 


^aner-jep  (rBAN7*S1)  has  recently  developed  technigues  for 
uncoverinq  data  dependencies  of  array  references  with  polynomial 
index  sets  within  loops.   These  technigues  have  been  applied  to  the 
problpa  of  dPterminir.g  whether  a  loop  can  be  "unrolled"  and  its 
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separate  iterations  executed  in  parallel.   The  techniques  are 
capable  of  determininq  if  there  exists  an  assignment  to  an  array 
element  in  one  iteration  of  the  loop  that  is  referenced  in  a 
different  iteration  of  the  loop  (so  that  the  loop  cannot  be 
executed  in  parallel) .   This  is  not  sufficient  for  the  requirements 
of  this  thesis,  but  the  underlying  techniques  may  prove  useful  as  a 
partial  solution  to  the  array  identification  problems  involved  in 
anomaly  detection. 

Within  his  framework,  Kildall  ([KIL76*))  has  developed  a 
"constant  propaqation  and  common  subexpression  elimination" 
analysis.   The  lattice  space  used  in  the  analysis  is  the  set  of 
partitions  of  the  set  of  subexpressions  of  the  proqram  beinq 
analyzed.   Two  expressions  are  placed  in  the  same  partition  if 
their  execution-time  values  are  equal.   In  essence,  the  analysis  is 
able  to  determine  subexpressions  that  have  the  same  value  at 
execution  time.   For  instance,  assume  that  there  are  exactly  two 
statements  between  S1  and  S2  of  Fiqure  6.4  and  that  these 
statements  are 

12  *  II  ♦  1  and 
II  =  12  -  1 
Upon  completion  of  the  analysis,  the  lattice  element  attached  to  S2 
and  S3  will  contain  one  partition  in  which  the  expressions  11, 
12  -  1,  and  13  are  members,  and  another  partition  in  which  the 
emissions  12  and  11  ♦  1  are  members.   From  this  information  it 
can  be  concluded  that  the  array  element  A  (11)  referenced  at  S1  is 
the  same  as  the  array  element  A(I3)  referenced  at  Si  and  different 
fron  *-hp  array  element  A  ( 1 2)  reference  at  S2 .   Havinq  determined 
this  information,  it  can  be  concluded  that  A  (11)  (at  S1)  is  live. 
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These    conclusions    arc    based    on    the    implicit    observation    that    11    was 
no*    modified    between    its   use    as  an    index    variable    into    array    A     (at 
S1)    and    its   subsequent    use    within    expressions.      The    validity    of 
this    assumption   can    be   quaranteed    by    assiqninq    the    value   of    each 
subscript    expression   to  a    unique    system-defined    variable.      The 
Henfical    arriy    element    reference    question  can    then    be    resolved    by 
det^rmininq    the    partitions   into    which    these   system-defined 
variables    fall. 

Thf*    constant    propagation    and   common    subexpression    elimination 
analysis    is  obviously    a   powerful    analysis,    but    it    is   very  costly    in 
both    time    and    space.       It    also    is    only    a    partial   solution    to    the 
array    identification    problem. 
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APPENDIX     I 
Zl2J!   Grjjp_h    Background 


A    liiect^i   graph    is   a    pair   G    =    (N,E)     where    N    is   a    set    of    n.od.e.5 
and    f    is    a   set    of   edges    which    constitute    a    relation    on    N; 
i.    ».  ,    F    C    (N    x    N)  .       An    edqe    (a,b)    is    said   to    leave    node    a    and 
fHIfr    node    b.       Also,    a    is    a    predecessor    of    b    and    b    is   a    successor 
»f    a;    a    is    the    head    of    the    edqe   and    b    is    the    taii    of    the    edqe. 

1    E^th    from    node    c   to    node   d    in    a   qraph    G  =     (N,E)     is   a 
sequence    of   noips    (nr01f    nr1n,    ...r    nrk-,)     k    >    0   such    that 
f  (n  r1  -  1-i  #    nr1i)     1    <    1    <  k)    C    E   and   c  =   n  r0n ,    d   =    nrk-i.       A   cy_cle   is 
a  path    (nr0-«,    nr1-i,    ...»    nrk-i)    such   that    n  r01    =   nrk-,,    k    >    1.       A 
nod-?,    a,    is   said    to    be   an    ancestor   of    b    if    there    is    a    path    from   a 
to  b;    in    such    a   case,    b    is   said    to    be    a    descendant    of    a. 

A    £1.22  2I^£h    is  a    triple    G   =    (N,E,e)     such    that     (N,E)    is    a 
iir»c*ed    qraph    and    e  e    N,    the    initial    or   entrv.   node,    has    the 
Droperty    that    for    every    m   e    N    there   exists   a    path    from   e    to    m. 

A    proqrara    may    be    partially   represented    by    use    of    a    flow    qraph 
wherp   the    nodes    of    the    qraph    represent    statements    of    the    proqram 
ind   t h»   edqes    represent    possible    flow    cf    control.       Within   this 
thesis,    it    is   assumed    that    the   source    proqram    has    been    transformed 

D   a    *low   qraph.       Thus,    the    text    references    the    nodes    of    the    flow 
"raph    instead    of    thp    statpiipnts   of    the    proqram. 
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It  is  assumed  that  all  qraphs  are  flow  graphs  (i.  e.,  have 
single  entry  points).   If  a  qraph,  G,  has  a  set  of  entry  points 
fe  r1-i ),  then  a  flow  qraph  G»  can  be  produced  by  creatinq  a  new 
node,  d,  and  addinq  the  set  of  edqes  {(d,erj-,)};  d  becomes  the  new 
unique  entry  node.   It  is  also  assumed  that  the  flow  qraphs  have 
sinqle  exit  points.   If  the  flow  qraph  were  to  have  no  exit  point, 
then  the  corr^spondinq  proqram  would  constitute  an  infinite  loop 
since  there  is  no  way  to  exit,  the  proqram.   Flow  qraphs  with 
multiple  exit  points  can  be  handled  similarly  to  qraphs  with 
multiple  entry  points. 
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APPENDIX  II 
£l°§sar.y.  Of  Terns 


A  few  useful  definitions  are  collected  here  for  reference* 
purposes.   When  applicable  they  are  repeated  in  the  section  where 
thev  are  used. 

'  reference  £oint  of  a  variable  V  is  a  node  in  which  data 
stored  in  V  is  "used."   Examples  of  such  are  nodes  correspondinq  to 
statements  in  which  V  is: 

•  used  in  an  expression, 

•  used  as  the  parameter  of  a  DO  loop,  or 

•  used  in  an  output  statement. 

An  assign  point  of  a  variable  V  is  a  node  in  which  V  is 
■odified.   Examples  of  such  are  nodes  correspondinq  to  statements 
in  which  V  is: 

•  assiqn^d  in  an  assignment  statement, 

•  used  in  an  input  statement,  or 

•  used   as    *he    index    of   a    D<~>    loop. 

A   co.JLE!i!i    point    of   an    expression    X    is    a    node    in    which    the 
value    of    X    is   computed. 

A    iiflnini   va£i§.bi§    °f    an    expression    X    is   a    variable   whose 
value    is    used    in    the    computation    of    the   expression. 
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A  tota],  e£jD£es,§ion  in  a  program  is  a  "complete"  expression 
that  occurs  as  an  entire  entity;  i.  e.,  a  total  expression  is  not 
lust  a  subexpression  of  a  larqer  expression.   For  instance,  in  tho 
assignment  statement 

A  =  B*(OD)/F 
the  expression  "B*(C+D)/F"  is  a  total  expression,  whereas  "C  +  D"  is 
not  (unless  it  appears  somewhere  else  in  the  proqram  as  a  total 
expression)  . 


An  expression,  X,  computed  at  node  n  is  £ecalcu_lable  at  node 
if  node  n  dominates  node  m  and  all  paths  from  node  n  to  node  m  an 
free  of  assign  points  of  defining  variables  of  X. 

Two  expressions  are  tree  equivalent  if  their  corresponding 
parse  trees  are  identical.   For  instance,  assume  left,  to  right 
evaluation  in  a  total  expression  such  as 

A  +  B*C  (this  will  be  refered  to  as  expression  X) . 
The  expression  A*B  is  tree  eguivalent  to  the  subexpression  A+B  ii 
expression  X,  but  B+C  is  not  tree  equivalent  to  the  subexpressioi 
B+C  of  expression  X.   The  reason  this  occurs  is  that  in  the  pars* 
tree  of  expression  X,  B+C  does  not  appear  as  a  subexpression. 


A  flow  graph  is  reducible  iff  every  cycle  of  the  flow  graph 
a  single  entry  cycle.   This  is  not  the  standard  definition  of 
reducible,  but  is  eguivalent  (fHEC721,  f  HKC7U  ])  . 
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